Experiences
-
09/2020 - 06/2025: Ph.D. Student at the Multimedia Lab, SIAT, CAS. Topic: Diffusion Models.
-
06/2022 - 06/2024: Research Intern at Noah's Ark Lab. Topic: Diffusion Models.
-
01/2020 - 04/2020: Visiting Student at Brigham Young University (Undergraduate Thesis Program). Topic: FPGA.
-
09/2016 - 06/2020: B.Eng. Student at Sun Yat-sen University. Topic: Person Re-Identification.
|
News
-
05/2026: One paper gets accepted by TIP 2026!
-
02/2026: One paper gets accepted by CVPR 2026!
-
12/2025: Two papers get accepted by ICASSP 2026!
-
11/2025: One paper gets accepted by AAAI 2026!
-
06/2025: One paper gets accepted by ICCV 2025!
-
05/2025: One paper gets accepted by ACL 2025!
-
02/2025: One paper gets accepted by TPAMI 2025!
|
|
Selected Publications
* indicates equal contribution, # indicates corresponding author
|
|
VideoChat-M1: Collaborative Policy Planning for Video Understanding via Multi-Agent Reinforcement Learning
Boyu Chen, Zikang Wang, Zhengrong Yue, Kainan Yan, Chenyun Yu, Yi Huang#, Zijun Liu, Yafei Wen, Xiaoxin Chen, Yang Liu, Peng Li, Yali Wang
IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2026
[Paper]
[arXiv]
We propose VideoChat-M1, a multi-agent video understanding framework built around collaborative policy planning. Multiple policy agents learn to generate, execute, and iteratively refine tool-use strategies through reinforcement learning, enabling stronger perception and reasoning on temporally and spatially complex videos across diverse benchmarks.
|
|
DIVE: Taming DINO for Subject-Driven Video Editing
Yi Huang, Wei Xiong, He Zhang, Chaoqi Chen, Jianzhuang Liu, Mingfu Yan, Shifeng Chen
IEEE/CVF International Conference on Computer Vision (ICCV), 2025
[Paper]
[Project]
[arXiv]
We propose DINO-guided Video Editing (DIVE) for subject-driven video editing conditioned on text prompts or reference images. By leveraging semantic features from a pretrained DINOv2 model, our framework simultaneously aligns motion trajectories for temporal consistency and learns targeted LoRAs to precisely preserve the subject's identity.
|
|
Diffusion Model-Based Image Editing: A Survey
Yi Huang, Jiancheng Huang, Yifan Liu, Mingfu Yan, Jiaxi Lv, Jianzhuang Liu, Wei Xiong, He Zhang, Liangliang Cao, Shifeng Chen
IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI), 2025 ESI highly cited
[Paper]
[Project]
[arXiv]
We present a comprehensive survey of diffusion models for image editing, analyzing existing methods across various learning strategies, user-input conditions, and specific tasks. To further evaluate text-guided editing algorithms, we propose EditEval, a systematic benchmark featuring an innovative LMM Score metric.
|
|
Dual-Schedule Inversion: Training-and Tuning-Free Inversion for Real Image Editing
Jiancheng Huang*, Yi Huang*, Jianzhuang Liu, Donghao Zhou, Yifan Liu, Shifeng Chen
IEEE/CVF Winter Conference on Applications of Computer Vision (WACV), oral, 2025
[Paper]
[arXiv]
We propose Dual-Schedule Inversion to address reconstruction failures common in DDIM Inversion for text-conditional image editing. By mathematically guaranteeing reversibility without fine-tuning, our method adaptively combines with various editing techniques to seamlessly modify targeted semantics while preserving the original identity of unedited regions.
|
|
WaveDM: Wavelet-Based Diffusion Models for Image Restoration
Yi Huang, Jiancheng Huang, Jianzhuang Liu, Mingfu Yan, Yu Dong, Jiaxi Lv, Chaoqi Chen, Shifeng Chen
IEEE Transactions on Multimedia (TMM), 2024 ESI highly cited
[Paper]
[Project]
[arXiv]
We propose a Wavelet-Based Diffusion Model (WaveDM) to address the slow inference times of diffusion-based image restoration. By learning clean image distributions in the wavelet domain with an Efficient Conditional Sampling (ECS) strategy, our approach achieves state-of-the-art performance across multiple tasks while being over 100x faster than vanilla diffusion models.
|
|
GPT4Motion: Scripting Physical Motions in Text-to-Video Generation via Blender-Oriented GPT Planning
Jiaxi Lv*, Yi Huang*, Mingfu Yan, Jiancheng Huang, Jianzhuang Liu, Yifan Liu, Yafei Wen, Xiaoxin Chen, Shifeng Chen
IEEE/CVF Conference on Computer Vision and Pattern Recognition PBDL Workshop (CVPRW), Best Paper Runner-Up, 2024
[Paper]
[Project]
[arXiv]
We propose GPT4Motion, a training-free framework that leverages large language models and physics engines to address computational costs and motion coherency issues in text-to-video generation. By using GPT-4 to generate Blender scripts for physical simulation and integrating these components with Stable Diffusion, our method efficiently produces high-quality videos with consistent and physically accurate motions.
|
|
MagicEraser: Erasing Any Objects via Semantics-Aware Control
Fan Li, Zixiao Zhang, Yi Huang, Jianzhuang Liu, Renjing Pei, Bin Shao, Songcen Xu
European Conference on Computer Vision (ECCV), 2024
[Paper]
[arXiv]
We propose MagicEraser, a tailored diffusion-based framework for object erasure that addresses the incongruent generation and prompt-dependency issues of existing inpainting methods. By integrating prompt tuning and semantics-aware attention refocus modules within a two-phase generation process, alongside a novel data construction strategy, our approach achieves fine-grained control and effectively mitigates undesired artifacts.
|
|
Learning Image-Adaptive Lookup Tables With Spatial Awareness for Image Harmonization
Yi Huang, Yu Dong, He Zhang, Jiancheng Huang, Shifeng Chen
IEEE Transactions on Consumer Electronics (TCE), 2023
[Paper]
We propose a novel, memory-efficient approach to high-resolution image harmonization by utilizing image-adaptive lookup tables (LUTs) instead of resource-heavy full image reconstruction. By employing a deep model to fuse basic LUTs for global color transformation alongside a spatial attention module for local refinement, our method achieves competitive performance with a significantly smaller model footprint.
|
Honors and Awards
- CAS Presidential Excellence Award, 2025 (中国科学院院长优秀奖)
- SIAT Special Prize of the Excellent Graduate Student Scholarship, 2025 (深圳先进技术研究院优秀研究生奖学金特别奖)
- SYSU Excellence Scholarship 2017, 2018, 2019 (中山大学优秀学生奖学金)
|
Academic Services
-
Conference Reviewer: CVPR, ICCV, ECCV, NeurIPS, ICML, ICLR, ACL, AAAI, ACM MM, AISTATS, WACV
-
Journal Reviewer: IEEE Transactions (TPAMI, TIP, TMM, TCSVT, TAI), IJCV, PR, CVIU
|
© Yi Huang | Last updated: March, 2026
|