Mingda Zhang

I am a Senior Research Scientist at Google DeepMind mainly focusing on Video Generation aka Veo.

I got my Ph.D. in Computer Science from the Department of Computer Science at University of Pittsburgh. Before coming to Pitt, I obtained my B.S. in Chemical Biology from Peking University.

My research interests lie broadly in artificial intelligence and machine learning, including the applications in computer vision and natural language processing. I am very lucky to have Prof. Adriana Kovashka and Prof. Rebecca Hwa as my co-advisors.

Research

Publications

Gemini 2.5: Pushing the Frontier with Advanced Reasoning, Multimodality, Long Context, and Next Generation Agentic Capabilities.
Gemini Team, Google.
Inference-time scaling for diffusion models beyond scaling denoising steps
Nanye Ma, Shangyuan Tong, Haolin Jia, Hexiang Hu, Yu-Chuan Su, Mingda Zhang, Xuan Yang, Yandong Li, Tommi Jaakkola, Xuhui Jia, Saining Xie.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), June 2025.
OmnixR: Evaluating Omni-modality Language Models on Reasoning across Modalities
Lichang Chen, Hexiang Hu, Mingda Zhang, Yiwen Chen, Zifeng Wang, Yandong Li, Pranav Shyam, Tianyi Zhou, Heng Huang, Ming-Hsuan Yang, Boqing Gong.
The Thirteenth International Conference on Learning Representations (ICLR), April 2025.
VIEWS: Entity-Aware News Video Captioning
Hammad Ayyubi, Tianqi Liu, Arsha Nagrani, Xudong Lin, Mingda Zhang, Anurag Arnab, Feng Han, Yukun Zhu, Xuande Feng, Kevin Zhang, Jialu Liu, Shih-Fu Chang.
Conference on Empirical Methods in Natural Language Processing (EMNLP), November 2024.
Video Timeline Modeling for News Story Understanding
Meng Liu, Mingda Zhang, Jialu Liu, Hanjun Dai, Ming-Hsuan Yang, Shuiwang Ji, Zheyun Feng, Boqing Gong.
Proceedings of the Neural Information Processing Systems (NeurIPS), Track on Datasets and Benchmarks, December 2023. (Spotlight)
(pdf)
Train-Once-for-All Personalization
Hong-You Chen, Yandong Li, Yin Cui, Mingda Zhang, Wei-Lun Chao, Li Zhang.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), June 2023.
(pdf)

If you are interested in my previous publications (before joining Google), click here...

How to Practice VQA on a Resource-limited Target Domain
Mingda Zhang, Rebecca Hwa, Adriana Kovashka.
Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (WACV), January 2023.
(project) (pdf) (bibtex)

@InProceedings{Zhang_2023_WACV,
    author    = {Zhang, Mingda and Hwa, Rebecca and Kovashka, Adriana},
    title     = {How To Practice VQA on a Resource-Limited Target Domain},
    booktitle = {Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (WACV)},
    month     = {January},
    year      = {2023},
}

Learning to Overcome Noise in Weak Caption Supervision for Object Detection
Mesut Erhan Unal, Keren Ye, Mingda Zhang, Christopher Thomas, Adriana Kovashka, Wei Li, Danfeng Qin, Jesse Berent
Transactions of Pattern Analysis and Machine Intelligence (TPAMI), 2022
(pdf) (bibtex)

@article{unal2022learning,
    title={Learning to Overcome Noise in Weak Caption Supervision for Object Detection},
    author={Unal, Mesut Erhan and Ye, Keren and Zhang, Mingda and Thomas, Christopher and Kovashka, Adriana and Li, Wei and Qin, Danfeng and Berent, Jesse},
    journal={IEEE Transactions on Pattern Analysis and Machine Intelligence},
    year={2022}
}

Abg-CoQA: Clarifying Ambiguity in Conversational Question Answering
Meiqi Guo, Mingda Zhang, Siva Reddy, Malihe Alikhani.
Proceedings of the 3rd Conference on Automated Knowledge Base Construction (AKBC), October 2021.
(pdf) (bibtex)

@InProceedings{Guo_2021_AbgCoQA,
    author = {Guo, Meiqi and Zhang, Mingda and Reddy, Siva and Diab, Ahmad and Alikhani, Malihe},
    title = {Abg-Co{QA}: Clarifying Ambiguity in Conversational Question Answering},
    booktitle = {3rd Conference on Automated Knowledge Base Construction (AKBC)},
    year = {2021}
}

Domain-robust VQA with Diverse Datasets and Methods but No Target Labels
Mingda Zhang, Tristan Maidment, Ahmad Diab, Adriana Kovashka, Rebecca Hwa.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), June 2021.
(project) (pdf) (poster) (slides) (arxiv) (bibtex)

@InProceedings{Zhang_2021_Domain,
    author = {Zhang, Mingda and Maidment, Tristan and Diab, Ahmad and Kovashka, Adriana and Hwa, Rebecca},
    title = {Domain-robust VQA with Diverse Datasets and Methods but No Target Labels},
    booktitle = {Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)},
    month = {June},
    year = {2021}
}

BasisNet: Two-stage Model Synthesis for Efficient Inference
Mingda Zhang, Chun-Te Chu, Andrey Zhmoginov, Andrew Howard, Brendan Jou, Yukun Zhu, Li Zhang, Rebecca Hwa, Adriana Kovashka.
4th Workshop on Efficient Deep Learning for Computer Vision (ECV21), Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) Workshop, June 2021. (Best Paper Award)
(pdf) (supplementary) (slides) (arxiv) (bibtex)

@InProceedings{Zhang_2021_BasisNet,
    author = {Zhang, Mingda and Chu, Chun-Te and Zhmoginov, Andrey and Howard, Andrew and Jou, Brendan and Zhu, Yukun and Zhang, Li and Hwa, Rebecca and Kovashka, Adriana},
    title = {BasisNet: Two-Stage Model Synthesis for Efficient Inference},
    booktitle = {Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) Workshops},
    month = {June},
    year = {2021}
}

Breaking Shortcuts by Masking for Robust Visual Reasoning
Keren Ye, Mingda Zhang, Adriana Kovashka.
Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (WACV), January 2021.
(pdf) (supplementary) (bibtex)

@InProceedings{Ye_2021_Shortcut,
    author = {Ye, Keren and Zhang, Mingda and Kovashka, Adriana},
    title = {Breaking Shortcuts by Masking for Robust Visual Reasoning},
    booktitle = {Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (WACV)},
    month = {January},
    year = {2021}
}

Story Completion with Explicit Modeling of Commonsense Knowledge
Mingda Zhang, Keren Ye, Rebecca Hwa, Adriana Kovashka.
Minds vs. Machines: How far are we from the common sense of a toddler?, Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) Workshop, June 2020.
(pdf) (recording) (bibtex)

@InProceedings{Zhang_2020_Story,
    author = {Zhang, Mingda and Ye, Keren and Hwa, Rebecca and Kovashka, Adriana},
    title = {Story Completion With Explicit Modeling of Commonsense Knowledge},
    booktitle = {Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) Workshops},
    month = {June},
    year = {2020}
}

Monitoring ICU Mortality Risk with A Long Short-Term Memory Recurrent Neural Network
Ke Yu*, Mingda Zhang*, Tianyi Cui*, Milos Hauskrecht. (*equal contributions)
Proceedings of Pacific Symposium on Biocomputing (PSB), January 2020. (Oral)
(pdf) (bibtex)

@InProceedings{Yu_2019_Monitoring,
    author = {Yu, Ke and Zhang, Mingda and Cui, Tianyi and Hauskrecht, Milos},
    title = {Monitoring ICU Mortality Risk with A Long Short-Term Memory Recurrent Neural Network},
    booktitle = {Proceedings of Pacific Symposium On Biocomputing (PSB)},
    month = {January},
    year = {2020}
}

Interpreting the Rhetoric of Visual Advertisements
Keren Ye, Narges Honarvar Nazari, James Hahn, Zaeem Hussain, Mingda Zhang, Adriana Kovashka.
Transactions of Pattern Analysis and Machine Intelligence (TPAMI), 2019.
(pdf) (bibtex)

@Article{Ye_2019_Interpreting,
    author = {Ye, Keren and Nazari, Narges Honarvar and Hahn, James and Hussain, Zaeem and Zhang, Mingda and Kovashka, Adriana},
    journal = {IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI)},
    title = {Interpreting the Rhetoric of Visual Advertisements},
    year = {2021},
    volume = {43},
    number = {4},
    pages = {1308-1323},
    doi = {10.1109/TPAMI.2019.2947440}
}

Cap2Det: Learning to Amplify Weak Caption Supervision for Object Detection
Keren Ye, Mingda Zhang, Adriana Kovashka, Wei Li, Danfeng Qin, Jesse Berent.
Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), October 2019.
(pdf) (supplementary) (code) (arxiv) (bibtex)

@InProceedings{Ye_2019_Cap2Det,
    author = {Ye, Keren and Zhang, Mingda and Kovashka, Adriana and Li, Wei and Qin, Danfeng and Berent, Jesse},
    title = {Cap2Det: Learning to Amplify Weak Caption Supervision for Object Detection},
    booktitle = {Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV)},
    month = {October},
    year = {2019}
}

Equal But Not The Same: Understanding the Implicit Relationship Between Persuasive Images and Text
Mingda Zhang, Rebecca Hwa, Adriana Kovashka.
Prceedings of the British Machine Vision Conference (BMVC), September 2018. (Spotlight)
(project) (dataset) (pdf) (slides) (recording) (arxiv) (bibtex)

@InProceedings{Zhang_2018_Equal,
    author = {Zhang, Mingda and Hwa, Rebecca and Kovashka, Adriana},
    title = {Equal But Not The Same: Understanding the Implicit Relationship Between Persuasive Images and Text},
    booktitle = {Proceedings of the British Machine Vision Conference (BMVC)},
    month = {September},
    year = {2018}
}

Automatic Understanding of Image and Video Advertisements
Zaeem Hussain, Mingda Zhang, Xiaozhong Zhang, Keren Ye, Chris Thomas, Zuha Agha, Nathan Ong, Adriana Kovashka.
Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), July 2017. (Spotlight)
(project) (pdf) (poster) (slides) (recording) (arxiv) (bibtex)

@InProceedings{Hussain_2017_Automatic,
    author = {Hussain, Zaeem and Zhang, Mingda and Zhang, Xiaozhong and Ye, Keren and Thomas, Christopher and Agha, Zuha and Ong, Nathan and Kovashka, Adriana},
    title = {Automatic Understanding of Image and Video Advertisements},
    booktitle = {Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR)},
    month = {July},
    year = {2017}
}

Multistep DNA-templated Synthesis using a Universal Template
Yizhou Li, Peng Zhao, Mingda Zhang, Xianyuan Zhao, Xiaoyu Li.
Journal of the American Chemical Society (JACS), 135, 17727-17730 (2013).

DNA-directed Formation of Peptide Bond: A Model Study toward DNA-programmed Peptide Ligation
Chi Zhang, Yizhou Li, Mingda Zhang, Xiaoyu Li.
Tetrahedron , 68, 5152-5156 (2012).

Detection of Bond Formations by DNA-programmed Chemical Reactions and PCR Amplification
Yizhou Li, Mingda Zhang, Chi Zhang, Xiaoyu Li.
Chemical Communications, 48, 9513-9515 (2012).

Experience

Work

(2021.12 - now) Senior Research Scientist at Google DeepMind
(2020.5 - 2020.8) Research Intern at Google Geo
(2020.1 - 2020.5) Student Researcher at Google Research
(2019.5 - 2019.8) Research Intern at Google Research
(2018.5 - 2018.8) Ph.D. Software Engineering Intern at Google Research

Professional Service

Conference Reviewer:
- CVPR: 2019-2025;
- ICCV: 2019, 2021, 2023;
- ECCV: 2020, 2024;
- NeurIPS: 2020-2024;
- AAAI: 2020-2022;
- ICLR: 2021 (Outstanding Reviewer), 2022, 2025;
- ICML: 2023-2024;
- ACCV: 2020;
- WACV: 2021-2023.

Contact

Email: mngdaz@gmail.com