Wenhu Chen [陈文虎 in Chinese]


profile photo

Assistant Professor at Computer Science at University of Waterloo

Vector Institute, CIFAR AI Chair

Senior Research Scientist at Google Deepmind (20% Part-time)

Email: wenhuchen [at] uwaterloo [dot] ca

Google Scholar  /  Github  /  Twitter


Biography

Wenhu Chen has been an assistant professor at Computer Science Department in University of Waterloo and Vector Institute since 2022. He obtained Canada CIFAR AI Chair Award in 2022. He also works for Google Deepmind as a part-time research scientist since 2021. Before that, he obtained his PhD from the University of California, Santa Barbara under the supervision of William Wang and Xifeng Yan. His research interest lies in natural language processing, deep learning and multimodal learning. He aims to design models to handle complex reasoning scenarios like math problem-solving, structure knowledge grounding, etc. He is also interested in building more powerful multimodal models to bridge different modalities. He received the Area Chair Award in AACL-IJCNLP 2023, the Best Paper Honorable Mention in WACV 2021, and the UCSB CS Outstanding Dissertation Award in 2021.

Research Interest

My research interest covers the following aspects:
  • Utilizing large language models to perform complex reasoning.
  • Building more controllable image generation model like subject-driven image generation, or image editing, etc.
  • Building the next-generation Music Understanding and Generation models.
  • Building multimodal retrieval systmes and enhancing interleaved multimodal content understanding.
  • Designing more explainble and accurate metrics to evaluate SoTA genrative models.

Research Highlight

You might have heard of me because of the following work I conducted.
  • MMMU/MMLU-Pro:: the commonly used language model and vision-language model evaluation suite.
  • MAmmoTH/MAmmoTH2: Strongest reasoning model to achieve SoTA in 2023 and 2024.
  • Re-Imagen/SuTI/Instruct-Imagen: the most effective and efficient and controllable image generation models. It's adopted in Google Coud Vertex AI.
  • Program-of-Though: A prompting strategy to use tools to sovle complex reasoning tasks.
  • AnyV2V: the most compatiable video editing tool for all purposes
  • TabFact/HybridQA: The commonly adopted table reasoning datasets.
  • MERT/ChatMusician/MuPT: the language model that allows you to understand and compose music.
  • OpenCodeInterpreter: the open replication of OpenAI code-interpreter to achieve SoTA on codegen tasks.

TIGER Lab

I direct the Text and Image GEnerative Research (TIGER) lab. My lab is focused on studying different generative models in different modalities including text, images, videos and music. We are committed to building powerful state-of-the-art models for various domains. Our lab is always looking for talented and self-motivated students.

M-A-P

I am one of the founding member of Multimodal Art Projection (M-A-P), which is an opensource research community. The coummnity members are working on Artificial Intelligence-Generated Content (AIGC) topics, including text, audio, and vision modalities. We do large language/music/multimodal models (LLMs/LMMs) training, data collection, and development of fun applications.

Awards

  • 2023: AACL-IJCNLP23 Area Chair Paper Award
  • 2022: Canada CIFAR AI Chair
  • 2021: UCSB CS Outstanding Dissertation Award
  • 2021: WACV21 Best Student Paper Honorable Mention
  • 2018: Tencent Rhino-Bird Award
  • 2016: IDEA Research Grant