Research
My current research focuses on Multimodal Large Language Models (MLLMs), specifically revolutionizing reinforcement learning for the alignment of Vision-Language Models and continuously pushing the boundaries of their reasoning ability to unlock full potential in complex scenarios.
|
|
PixelCraft: A Multi-Agent System for High-Fidelity Visual Reasoning on Structured Images
Shuoshuo Zhang, Zijian Li, Yizhen Zhang, Jingjing Fu, Lei Song, Jiang Bian, Jun Zhang, Yujiu Yang, Rui Wang
Under Review
Preprint / Code
|
|
PeRL: Permutation-Enhanced Reinforcement Learning for Interleaved Vision-Language Reasoning
Yizhen Zhang, Yang Ding, Shuoshuo Zhang, Xinchen Zhang, Haoling Li, Zhong-zhi Li, Peijie Wang, Jie Wu, Lei Ji, Yelong Shen, Yujiu Yang, Yeyun Gong
NeurIPS 2025
Preprint / Code
|
|
Teaching Your Models to Understand Code via Focal Preference Alignment
Jie Wu, Haoling Li, Xin Zhang, Xiao Liu, Yangyu Huang, Jianwen Luo, Yizhen Zhang, Zuchao Li, Ruihang Chu, Yujiu Yang, Scarlett Li
EMNLP main 2025
Preprint / Code
|
|
Efficiently Building Large Language Models through Merging
Yizhen Zhang, Yang Ding, Jie Wu, Yujiu Yang
NeurIPS 2024 LMC Oral
Preprint
|
Honors & Awards
- Neurips 2024 Large Language Model Merging Challenge (LMC)[link] Rank:1/150, 2024
- CVPR 2023 workshop Image Matching Challenge [link] Silver medal, 2023
|
|