Wenhu Chen [陈文虎 in Chinese]


profile photo

Assistant Professor at Computer Science at University of Waterloo

Vector Institute, CIFAR AI Chair

Senior Research Scientist at Google Deepmind (20% Part-time)

Email: wenhuchen [at] uwaterloo [dot] ca

Google Scholar  /  Github  /  Twitter


Biography

Wenhu Chen has been an assistant professor at Computer Science Department in University of Waterloo and Vector Institute since 2022. He obtained Canada CIFAR AI Chair Award in 2022. He also works for Google Deepmind as a part-time research scientist since 2021. Before that, he obtained his PhD from the University of California, Santa Barbara under the supervision of William Wang and Xifeng Yan. His research interest lies in natural language processing, deep learning and multimodal learning. He aims to design models to handle complex reasoning scenarios like math problem-solving, structure knowledge grounding, etc. He is also interested in building more powerful multimodal models to bridge different modalities. He received the Area Chair Award in AACL-IJCNLP 2023, the Best Paper Honorable Mention in WACV 2021, and the UCSB CS Outstanding Dissertation Award in 2021.

Research Interest

My research interest covers the following aspects:
  • Utilizing large language models to perform complex reasoning.
  • Building more controllable image generation model like subject-driven image generation, or image editing, etc.
  • Building the next-generation Music Understanding and Generation models.
  • Building multimodal retrieval systmes and enhancing interleaved multimodal content understanding.
  • Designing more explainble and accurate metrics to evaluate SoTA genrative models.

Research Highlight

You might have heard of me because of the following work I conducted.
  • MMMU/MMLU-Pro:: the commonly used language model and vision-language model evaluation suite.
  • MAmmoTH/MAmmoTH2: Strongest reasoning model to achieve SoTA in 2023 and 2024.
  • Re-Imagen/SuTI/Instruct-Imagen: the most effective and efficient and controllable image generation models. It's adopted in Google Coud Vertex AI.
  • Program-of-Though: A prompting strategy to use tools to sovle complex reasoning tasks.
  • UniIR/MagicLens: the framework to enable unified and compositional multimodal information retrieval.
  • MERT/ChatMusician: the language model that allows you to understand and compose music.
  • ConsistI2V/T2V-Turbo/AnyV2V: efficient image-to-video, text-to-video generation and video editing models.
  • MAP-Neo: Fully open-source language model to approach performance of Mistral-7B.
  • TabFact/HybridQA/StructLM: The structure-knowledge grounding datasets and foundation models.

TIGER Lab

I direct the Text and Image GEnerative Research (TIGER) lab. My lab is focused on studying different generative models in different modalities including text, images, videos and music. We are committed to building powerful state-of-the-art models for various domains. Our lab is always looking for talented and self-motivated students.

M-A-P

I am one of the founding member of Multimodal Art Projection (M-A-P), which is an opensource research community. The coummnity members are working on Artificial Intelligence-Generated Content (AIGC) topics, including text, audio, and vision modalities. We do large language/music/multimodal models (LLMs/LMMs) training, data collection, and development of fun applications.

Awards

  • 2024: CVPR Best Paper Finalist
  • 2023: AACL-IJCNLP23 Area Chair Award
  • 2022: Canada CIFAR AI Chair
  • 2021: UCSB CS Outstanding Dissertation Award
  • 2021: WACV21 Best Student Paper Honorable Mention
  • 2018: Tencent Rhino-Bird Award
  • 2016: IDEA Research Grant

Fundings

  • CIFAR AI Chair Funding: Accessing Diverse Web Knowledge with Natural Language Interface (2022 - 2027)
  • NSERC Discovery Fund: Building Semiparametric Models to Decouple Knowledge from Computation (2023 - 2028)
  • Mitacs Accelerate Fund: Question Answering over Long Clinical Documents (2024 - 2026)
  • CIFAR AI Catalyst Fund: Generating Images with Multimodal Instruction (2024 - 2026)

Media Coverage