SugaFormer: Super-class guided Transformer for Zero-Shot Attribute Classification (AAAI '25)
LLaMo: Large Language Model-based Molecular Graph Assistant (NeurIPS'24)
VidChain: Chain-of-Tasks with Metric-based Direct Preference Optimization for Dense Video Captioning (AAAI '25)
DialogGSR: Generative Subgraph Retrieval for Knowledge Graph–Grounded Dialog Generation (EMNLP'24)
A superintelligence is a hypothetical agent that possesses intelligence far surpassing that of the brightest and most gifted human minds. This hypothetical ability can also be referred to as Artificial General Intelligence (AGI). There have been many milestones towards such a goal, including GPT-4, ChatGPT, CLIP, and Flamingo. They have shown marvelous performance on diverse tasks compared to task-specific weak AI models, even without specific training. Nowadays, these AGI/foundation models have acquired multi-modality (images, videos, knowledge graphs, etc.), achieving a deeper understanding of the world. Our overarching goal is to develop a general-purpose learning system capable of learning and performing unseen tasks using every modality it can utilize.
Our related publications
[AAAI’ 25] Super-class guided Transformer for Zero-Shot Attribute ClassificationInvBO: Inversion-based Latent Bayesian Optimization
(NeurIPS '24)
NF-BO: Latent Bayesian Optimization via Autoregressive Normalizing Flows (ICLR '25, Oral presentation)
LLaMo: Large Language Model-based Molecular Graph Assistant (NeurIPS'24)
In modern data analysis, highly structured data frequently occur and they can be viewed as data on non-Euclidean spaces (e.g., graphs, Riemannian manifolds, data manifolds, and functional spaces). We focus on AI for Science, applying graph neural networks (GNNs) to areas such as molecule language models, molecule optimization for drug discovery, and weather forecasting. Our goal is to leverage these advanced techniques to model complex scientific data and drive innovation in scientific research.
Our related publications
[TPAMI] Deformable Graph TransformerNF-BO: Latent Bayesian Optimization via Autoregressive Normalizing Flows (ICLR '25, Oral presentation)
CAF: Constant Acceleration Flow (NeurIPS '24)
Generative models represent a cornerstone in artificial intelligence, serving as powerful engines for innovation in both drug discovery, image generation and video generation. In drug discovery, these models leverage machine learning like Bayesian optimization to design novel molecules, accelerating the identification of potential therapeutic compounds. Concurrently, in image generation, techniques like diffusion models produce realistic images, enabling creative expression and practical applications across diverse fields. With their ability to generate new data samples and push the boundaries of what's possible, generative models continue to reshape industries and drive progress in science and technology.
Our related publications
[ICLR’ 25] Latent Bayesian Optimization via Autoregressive Normalizing Flows (Oral presentation, top 1.8%).High-level computer vision enables a deeper understanding of the visual world. Object recognition systems detect objects in images and videos. They offer basic information on whether certain objects are in the scene and how many instances are in the scene. But the information may not be sufficient for building personalized and automated systems for smart city: smart home, smart offices, and hospitals. Without a deep understanding of the interaction between humans and objects, it is hard to understand the context of the scene and what kind of services are needed. "Scene Understanding" is one topic to study such interaction and generate metadata such as scene graphs. It allows "Visual Question Answering (VQA)". Security cameras are pervasive in modern cities and computer vision helps anomaly detection: flood, wildfire, dangerous wild animals, and estimate traffic and even temperature. We study algorithms that offer a more accurate and deeper understanding of the visual world and help people to live safer and smarter.
Our related publications
[CVPR’ 25] EfficientViM: Efficient Vision Mamba with Hidden State Mixer based State Space DualityUP-NeRF: Unconstrained Pose-Prior-Free Neural Radiance Fields (NeurIPS '23)
Semantic-Aware Implicit Template Learning via Part Deformation Consistency (ICCV '23)
3D data (e.g., point cloud, voxel, polygonal mesh) are crucial to diverse fields like robotics, autonomous driving, AI Drones, medical data analysis, and scene reconstruction. We are interested in the field of 3D Computer Vision and 3D Deep Learning based on 3D data, which has more complex geometry than 2D data. Shape classification, indoor/outdoor scene semantic segmentation, and shape correspondence/registration are representative tasks for point cloud data. In addition, Implicit Neural Representation (INR) is in our interest, which is an emerging paradigm that offers a novel approach to representing complex geometric shapes and scenes.
Our related publications
[ICLR '24] Domain-agnostic Latent Diffusion Models for Synthesizing High-Quality Implicit Neural Representations[NeurIPS '23] Unconstrained Pose Prior-Free Neural Radiance Field[ICCV '23] Semantic-Aware Implicit Template Learning via Part Deformation Consistency[ICML '23] Robust Camera Pose Refinement for Multi-Resolution Hash Encoding[CVPR '23] Self-positioning Point-based Transformer for Point Cloud Understanding [NeurIPS '22] SageMix: Saliency-Guided Mixup for Point Clouds[ICCV '21] Point Cloud Augmentation with Weighted Local TransformationsMachine learning models (or deep neural networks) have been used in a variety of applications including autonomous robots, vehicles, and drones. When deploying AI systems to the physical world, the reliability of algorithms is crucial for safety. Guaranteeing such safety includes specification, robustness, and assurance. Given a concrete purpose of the system (specification), the AI system should be robust to perturbations and attacks (adversarial examples). Further, the uncertainty of predictions by models helps monitor and control the AI system's activity. In this line of thought, we study uncertainty of models (e.g., Bayesian Neural Networks) and adversarial examples from both attacker and defender perspectives. This topic may fall in the intersection of AI and security.
Our related publications
[IEEE ACCESS'21] Search-and-Attack: Temporally SparseAdversarial Perturbations on Videos[ECCV '20] Robust Neural Networks inspired by Strong Stability Preserving Runge-Kutta methods[UAI '19] Sampling-free Uncertainty Estimation in Gated Recurrent Units with Applications to Normative Modeling in Neuroimaging[arxiv '18] Sampling-free Uncertainty Estimation in Gated Recurrent Units with Exponential FamiliesMedical imaging or brain imaging inherently has many structured measurements such as diffusion tensor image (DTI), high angular resolution diffusion images (HARDI), ensemble average propagators (EAPs), etc. Common goals in medical imaging are to identify important regions related to a certain disease, detect diseases at the early stage, and model the disease progression. To provide predictions and findings that are rigorously tested by statistics, more powerful pipelines are needed. We study a more powerful representation of medical images and models (mixed effects models for structured data, filtering, dimensionality reduction etc.). We also research few-shot detection, domain-adaptation, and contrastive learning to deal with limited samples and labels in the medical domain.
Our related publications
[CVPR '17] Riemannian Nonlinear Mixed Effects Models: Analyzing Longitudinal Deformations in Neuroimaging [CVPRW '17] Riemannian Variance Filtering: An Independent Filtering Scheme for Statistical Tests on Manifold-valued Data [ECCV '15] Canonical Correlation Analysis on Riemannian Manifolds and its Applications [CVPR '14] MGLM on Riemannian Manifolds with Applications to Statistical Analysis of Diffusion Weighted Images