Wenhu Chen

Wenhu Chen [陈文虎 in Chinese]

Assistant Professor at Computer Science at University of Waterloo

Vector Institute, CIFAR AI Chair

Senior Research Scientist at Google Deepmind (20% Part-time)

Email: wenhuchen [at] uwaterloo [dot] ca

Google Scholar / CV (updated in Jan 25) / Github / Twitter

Biography

Wenhu Chen has been an assistant professor at Computer Science Department in University of Waterloo and Vector Institute since 2022. He obtained Canada CIFAR AI Chair Award in 2022. He also works for Google Deepmind as a part-time research scientist since 2021. Before that, he obtained his PhD from the University of California, Santa Barbara under the supervision of William Wang and Xifeng Yan. His research interest lies in natural language processing, deep learning and multimodal learning. He aims to design models to handle complex reasoning scenarios like math problem-solving, structure knowledge grounding, etc. He is also interested in building more powerful multimodal models to bridge different modalities. He received the Area Chair Award in AACL-IJCNLP 2023, the Best Paper Honorable Mention in WACV 2021, and the UCSB CS Outstanding Dissertation Award in 2021.

Research Interest

My research interest covers the following aspects:

Reasoning
Controllable GenAI
Information Retrieval
Benchmarks and Evaluation

Research Highlights

You might have heard of me because of the following work I conducted.

1. Natural Language Processing (LLMs)

MAmmoTH/MAmmoTH2/Critique Fine-Tuning: Advancing reasoning model to solve complex reasoning tasks
Program-of-Thoughts: A prompting strategy to use tools to sovle complex reasoning tasks
OpenCoderInterpreter/AceCoder: Advanced coding language models for complex tasks
MAP-Neo/Fine-FineWeb: Fully open-source language models with high-quality pre-training datasets
KB-BINDER/TableCoT/StructLM: Grounding foundation models on structured knowledge

2. Multimodal Understanding (Image + Video)

MuRAG/UniIR/MagicLens/VLM2Vec/DSE/VISA: the framework to enable unified and compositional multimodal information retrieval
Mantis/MAmmoTH-VL/MAmmoTH-VL2: advanced vision-language models with better reasoning skills
VISTA/Vamba: Long video understanding models

3. Multimodal Generation (Image + Video)

Re-Imagen/SuTI/Kosmos-G/Instruct-Imagen: the most effective and efficient and controllable image generation models
T2V-Turbo/T2V-Turbo-v2: efficient text-to-video generation models
MagicBrush/OmniEdit/AnyV2V: powerful image and video editing models

4. Benchmarks & Evaluation

MMMU/MMLU-Pro/TheoremQA/MEGA-Bench:: the commonly used language model and vision-language model evaluation suite
TabFact/HybridQA/OTT-QA: Table and text reasoning evaluation benchmarks

5. Others

MERT/ChatMusician/YuE: Foundation models for understanding and composing music
TheoremExplainAgent: Building agents for composing education videos

TIGER Lab

I direct the Text and Image GEnerative Research (TIGER) lab. My lab is focused on studying different generative models in different modalities including text, images, videos and music. We are committed to building powerful state-of-the-art models for various domains. Our lab is always looking for talented and self-motivated students.

Awards

2025: Math Golden Jubilee Award
2024: CVPR Best Paper Finalist
2023: AACL-IJCNLP23 Area Chair Award
2022: Canada CIFAR AI Chair
2021: UCSB CS Outstanding Dissertation Award
2021: WACV21 Best Student Paper Honorable Mention
2018: Tencent Rhino-Bird Award
2016: IDEA Research Grant

Fundings

CIFAR AI Chair Funding: Accessing Diverse Web Knowledge with Natural Language Interface (2022 - 2027)
NSERC Discovery Fund: Building Semiparametric Models to Decouple Knowledge from Computation (2023 - 2028)
Mitacs Accelerate Fund: Question Answering over Long Clinical Documents (2024 - 2026)
CIFAR AI Catalyst Fund: Generating Images with Multimodal Instruction (2024 - 2026)
National Research Council Canada - AI4D Funding: Accelerating Scientific Discovery with Foundation Models (2024 - 2026)
National Research Council Canada - New Beginning Funding: Building More Efficient Visual Generative Models (2025 - 2026)