Yaodong Yang 杨耀东 · Boya Young Scholar

Assistant Professor, PKU Chief Scientist, PKU-PsiBot Lab
Yaodong Yang YY

Dr. Yaodong Yang is an Assistant Professor (Boya Young Scholar) at the Institute for Artificial Intelligence, Peking University and Chief Scientist of the PKU–PsiBot Joint Laboratory. His research focuses on experience learning and alignment of AI / Embodied agents, aiming to advance the trustworthy deployment and real-world alignment of large models, spanning the areas of reinforcement learning, AI alignment, and embodied intelligence.

He has published over 200 papers in leading journals and conferences, including Nature Machine Intelligence, Cell Matter, Artificial Intelligence Journal, and IEEE TPAMI, with more than 16,000 Google Scholar citations. Since 2022, he has been ranked as the top scholar in AI & ML at Peking University according to CSRankings.

Dr. Yang has received numerous honors, including the ACL 2025 Best Paper Award, UKRI 2026 Best Paper Award in AI, ICCV 2023 Best Paper Finalist, CoRL 2020 Best System Paper Award, and the AAMAS 2021 Blue Sky Idea Award.

He was named to the MIT Technology Review "AI 100 Young Innovators", the 2025 Forbes China Technology & Innovation Innovative Leader list, received the WAIC 2022 "Yunfan Star Award", and the ACM SIGAI China Rising Star Award. His work has been featured by CCTV, People's Daily, Xinhua News, the National Natural Science Foundation of China (NSFC), and MIT Technology Review.

He serves as an Area Chair for major conferences including ICML, ICLR, NeurIPS, AAAI, IJCAI, AAMAS, and IROS, and as an Associate Editor for Scientific Reports, Transactions on Machine Learning Research, and Neural Networks.

Previously, Dr. Yang was an Assistant Professor at King's College London, a Principal Researcher at Huawei Research U.K., and a Senior Manager at AIG. He received his B.Sc. from the University of Science and Technology of China, M.Sc. from Imperial College London, and Ph.D. from University College London, where he was the university's sole nominee for the ACM SIGAI Doctoral Dissertation Award.

| CSRanking · #1 PKU AI+ML | Best Paper Award · Five times | Elsevier · World Top 2% Scientist
200+
Publications
Nature MI · Matter · JMLR · TPAMI
16k+
Citations
Google Scholar · h-index 60
#1
PKU AI+ML Rank since 2022
CSRankings · AIRankings
5+
Best-Paper-Level Awards
ACL · UKRI · CoRL · ICCV · AAMAS
— Industrial Collaborations · partners

News

Headlines · recent updates

Browse the full timeline ICML 2026 · NeurIPS 2025 · ACL 2025 Best Paper · ICLR · CoRL 2020 … 36 entries
2026 · 05
2026 · 04
2026 · 02
2025 · 09
2025 · 05
2025 · 01
2024 · 12
2024 · 10

Invited talk "Can LLMs be aligned?" at CNCC 2024.

2024 · 09
2024 · 08
2024 · 05
2024 · 03

Co-signed the Beijing AI Safety Declaration with leading scientists.

2024 · 02

Featured on CCTV「焦点访谈」 — national TV report on AI Safety.

2024 · 01
2023 · 12
2023 · 11

Released the AI Alignment Survey.

2023 · 10
Paper on the ICCV 2023 Best Paper Initial List (top 17 / 8260).
2023 · 09
2023 · 06

TorchOpt officially joined the PyTorch Ecosystem.

2023 · 05
2023 · 02
2023 · 01
2022 · 12

NeurIPS 2022 MyoChallenge — 1st place (1 / 340 teams).

2022 · 11
2022 · 09
2022 · 05
1 paper accepted at IJCAI 2022.
2022 · 04

TorchOpt and Bi-DexHands open-sourced.

2022 · 01
2021 · 09
2021 · 05
2021 · 02
AAMAS 2021 Blue-Sky Idea Best Paper Award.
2020 · 10
SMARTS platform released; CoRL 2020 Best System Paper Award.
2020 · 06
1 paper accepted at ICML 2020.
2020 · 05
2020 · 02

Research

Five directions · methods, benchmarks, and open-source systems

01 / RL for Alignment

LLM Alignment & RLHF

RLHF, preference learning, safe alignment, red-teaming and interpretability. Principled methods and open benchmarks — BeaverTails, PKU-SafeRLHF, Stream Aligner, Libra-Leaderboard — to make LLMs robustly helpful and harmless.

02 / RL for Embodied AI

Embodied Reinforcement Learning

Dexterous manipulation, vision-language-action models, and sim-to-real. From Bi-DexHands and ClutterDexGrasp to DexGraspVLA and Safe VLA — pursuing human-level generalist robotic agents.

03 / Multi-agent RL

Multi-Agent RL

Cooperative and competitive MARL, policy gradient theory, Nash equilibria. HARL, MAT, MARLlib, MALib — algorithms that scale to hundreds of agents.

04 / Agentic RL

Agentic RL & Social Simulation

LLM-based agents for macroeconomic modelling, social value orientation, negotiation and consensus. World models unifying physical and social dynamics.

05 / RL for Science

RL for Science

RL and LLMs applied to mathematics, medicine, physics, materials (carbon-nanotube synthesis), and operations — featured in Cell iScience, Matter, and National Science Review.

Press

National coverage · CCTV · Xinhua · NSFC · MIT Tech Review

CCTV · Xinhua News · People's Daily · MIT Tech Review

Awards

Best papers · talent programs · academic honors · competitions

I. Best-Paper Awards 5 awards
2026

UKRI Best Research Paper in AI

Efficient and Scalable Reinforcement Learning for Large-Scale Network Control · Nature Machine Intelligence

2025

ACL 2025 Best Paper Award

Language Models Resist Alignment: Evidence From Data Compression

2023

ICCV 2023 Best Paper Finalist

UniDexGrasp: Universal Robotic Dexterous Grasping via Learning Diverse Proposal Generation and Goal-Conditioned Policy

2021

AAMAS 2021 Blue-Sky Idea Award

Diverse Auto-Curriculum is Critical for Successful Real-World Multiagent Learning Systems

2020

CoRL 2020 Best System Paper Award

SMARTS: An Open-Source Scalable Multi-Agent RL Training School for Autonomous Driving

II. Talent Programs 3 programs
2024

National Young Talent

NSFC Excellent Young Scientist

2022

High-Level Overseas Talent

Ministry of Human Resources — 30 nationwide

2023

CAST Youth Talent Support Program

CAAI — 6 selected nationally

III. Academic Honors 5 honors
2025

Elsevier / Stanford World Top 2% Scientists

Global Top 2% career-impact ranking

2025

MIT Tech Review — AI 100 Young Innovators

MIT Technology Review · "AI 100 Young Innovators"

2026

Forbes China — Innovation Leader

Forbes China · Innovation & Tech Leaders

2022

ACM SIGAI China Rising Star Award

ACM SIGAI China · 3 awardees nationwide

2022

WAIC Yunfan Award — Rising Star

WAIC · 10 awardees nationwide

IV. Competitions & Industry 4 awards
2025

Wu Wenjun AI S&T Award · 2nd Prize

Wu Wenjun AI S&T Award · 2nd Prize — Knowledge-Enhanced Trustworthy Multimodal Interaction

2025

CMSA Meteorological Tech Invention Award · 1st Prize

CMSA · 1st Prize for Technological Invention — BeiDou + AI for Extreme-Wind Emergency Navigation

2022

NeurIPS 2022 MyoChallenge · Winner

Physiological dexterity manipulation · 1 / 340 teams

2025

Digital China Innovation Contest · AI Track 1st Prize

Digital China Innovation Contest · AI Track · National 1st Prize

Mentorship

Highest PKU student honors · Apple & Tencent fellowships · NSFC grants

2024 Highest Student Honor · PKU

PKU May-4th Medal

Yiran Geng 耿逸然 (2024) Boyuan Chen 陈博远 (2026)
PKU's highest honor for students (once every two years) · sole recipient among all undergraduate STEM majors.
2024 University-Wide · PKU

PKU Annual Figures

Jiaming Ji 吉嘉铭 (2025) Boyuan Chen 陈博远 (2025)
Two PAIR-Lab students named PKU Annual Figures — one of the most prestigious annual recognitions at Peking University.
2025 Industry Fellowship · Apple

Apple Scholars in AI / ML

Jiaming Ji 吉嘉铭
Apple PhD Fellowship (2025) — one of only 12 scholars selected globally.
2025 Industry Fellowship · Tencent

Tencent Hunyuan Scholar

Jiaming Ji 吉嘉铭
Tencent's flagship PhD fellowship for top AI students in China.
2024 NSFC · PhD Student Grant

NSFC Young Student
Basic Research (PhD)

Jiaming Ji 吉嘉铭
Sole PhD awardee in PKU's AI direction — NSFC Young Student Basic Research Program (PhD).
2024 NSFC · Undergraduate Grant

NSFC Young Student
Basic Research (UG)

Tianyi Qiu 邱天异
One of only two undergraduates in PKU's AI direction to receive this grant.
Teaching Awards
2026

PKU Teaching Achievement Award · 2nd Prize

For the course "Foundations and Alignment of Large Language Models" (《大语言模型基础与对齐》).

2025

Digital China Innovation Contest · AI Track 1st Prize

2025 Digital China Innovation Competition — AI Track, First Prize National.

2025

ICBC Teaching Award · PKU

ICBC Teaching Award · PKU · 2025

2022–

Class Advisor · Yuanpei AGI Experimental Class

Yuanpei College · Class Advisor & Curriculum Committee · AGI Experimental Class (2022 cohort)

2023 – 2025

Outstanding Undergraduate Research Supervisor · PKU

Awarded three years in a row (2023, 2024, 2025) by Peking University.

Publications

Representative works · browse by topic below

2026 1 paper
ALN
Evolving Diverse Red-team Language Models in Multi-round Multi-agent Games *
Chengdong Ma, Ziran Yang, Hai Ci, Jun Gao, Minquan Gao, Xuehai Pan, Yaodong Yang#
IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI)
Red-teamingMulti-Agent RLLLM
2025 2 papers
ALN
Language Models Resist Alignment: Evidence From Data Compression *
Jiaming Ji, Kaile Wang, Tianyi Alex Qiu, Boyuan Chen, Jiayi Zhou, Changye Li, Hantao Lou, Josef Dai, Yunhuai Liu, Yaodong Yang#
ACL 2025 ★ Best Paper
Alignment TheoryAlignmentLLM
ALN
Safe VLA: Towards Safety Alignment of Vision-Language-Action Model via Safe Reinforcement Learning *
Borong Zhang, Yuhao Zhang, Jiaming Ji, Yingshan Lei, Josef Dai, Yuanpei Chen, Yaodong Yang#
NeurIPS 2025 Spotlight
Safe VLAVLASafe RLSafetyAlignment
2024 7 papers
EMB
ASP: Learn a Universal Neural Solver *
Chenguang Wang, Zhouliang Yu, Stephen McAleer, Tianshu Yu, Yaodong Yang#
IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI)
Combinatorial Optimization
ALN
Aligner: Efficient Alignment by Learning to Correct *
Jiaming Ji, Boyuan Chen, Hantao Lou, Donghai Hong, Borong Zhang, Xuehai Pan, Juntao Dai, Yaodong Yang#
NeurIPS 2024 Oral
AlignerAlignment
EMB
Bi-DexHands: Towards Human-Level Bimanual Dexterous Manipulation *
Yuanpei Chen, Yiran Geng, Fangwei Zhong, Jiaming Ji, Jiechuang Jiang, Zongqing Lu, Hao Dong, Yaodong Yang#
IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI)
BimanualDexterous ManipulationRobotics
AI4
Efficient and scalable reinforcement learning for large-scale network control *
Chengdong Ma, Aming Li, Yali Du, Hao Dong, Yaodong Yang#
Nature Machine Intelligence ★ Best Paper
Network ControlReinforcement Learning
MRL
Heterogeneous-Agent Reinforcement Learning *
Yifan Zhong, Jakub Grudzien Kuba, Xidong Feng, Siyi Hu, Jiaming Ji, Yaodong Yang#
Journal of Machine Learning Research (JMLR)
HARLReinforcement Learning
ALN
Omnisafe: An infrastructure for accelerating safe reinforcement learning research *
Jiaming Ji, Jiayi Zhou, Borong Zhang, Juntao Dai, Xuehai Pan, Ruiyang Sun, Weidong Huang, Yiran Geng, Mickel Liu, Yaodong Yang#
Journal of Machine Learning Research (JMLR)
OmniSafeSafe RLReinforcement Learning
AI4
Transforming the synthesis of carbon nanotubes with machine learning models and automation *
Yue Li, Shurui Wang, Zhou Lv, Zhaoji Wang, Yunbiao Zhao, Ying Xie, Yang Xu, Liu Qian, Yaodong Yang#, Ziqiang Zhao#, Jin Zhang#
Matter (Cell Press)
Carbon NanotubesMaterials Synthesis
Media Xinhua
2023 4 papers
MRL
MARLlib: A Multi-agent Reinforcement Learning Library *
Siyi Hu, Yifan Zhong, Minquan Gao, Weixun Wang, Hao Dong, Xiaodan Liang, Zhihui Li, Xiaojun Chang, Yaodong Yang#
Journal of Machine Learning Research (JMLR)
MARLlibMulti-Agent RLReinforcement Learning
MRL
On the complexity of computing markov perfect equilibrium in general-sum stochastic games *
Xiaotie Deng, Ningyuan Li, David Mguni, Jun Wang, Yaodong Yang#
National Science Review
Nash EquilibriumStochastic Games
ALN
Safe multi-agent reinforcement learning for multi-robot control *
Shangding Gu, Jakub Grudzien Kuba, Yuanpei Chen, Yali Du, Long Yang, Alois C. Knoll, Yaodong Yang#
Artificial Intelligence Journal (AIJ)
Multi-Agent RLRoboticsReinforcement Learning
MRL
TorchOpt: An Efficient Library for Differentiable Optimization *
Jie Ren, Xidong Feng, Bo Liu, Xuehai Pan, Yao Fu, Luo Mai, Yaodong Yang#
Journal of Machine Learning Research (JMLR)
Differentiable Optimization
2021 1 paper
MRL
Diverse Auto-Curriculum is Critical for Successful Real-World Multiagent Learning Systems *
Yaodong Yang, Jun Luo, Ying Wen, Oliver Slumbers, Daniel Graves, Haitham Bou Ammar, Jun Wang, Matthew E. Taylor
AAMAS 2021 ★ Best Paper
Auto-CurriculumMulti-Agent RL
2020 1 paper
EMB
SMARTS: An Open-Source Scalable Multi-Agent RL Training School for Autonomous Driving
Ming Zhou, Jun Luo, Julian Villella, Yaodong Yang, David Rusu, Jiayu Miao, Weinan Zhang, Montgomery Alban, Iman Fadakar, Zheng Chen, Aurora Chongxi Huang, Ying Wen, Kimia Hassanzadeh, Daniel Graves, Dong Chen, Zhengbang Zhu, Nhat Nguyen, Mohamed Elsayed, Kun Shao, Sanjeevan Ahilan, Baokuan Zhang, Jiannan Wu, Zhengang Fu, Kasra Rezaee, Peyman Yadmellat, Mohsen Rohani, Nicolas Perez Nieves, Yihan Ni, Seyedershad Banijamali, Alexander Cowen Rivers, Zheng Tian, Daniel Palenicek, Haitham bou Ammar, Hongbo Zhang, Wulong Liu, Jianye Hao, Jun Wang
CoRL 2020 ★ Best Paper
SMARTSAutonomous DrivingMulti-Agent RL
2018 1 paper
MRL
Mean Field Multi-Agent Reinforcement Learning
Yaodong Yang, Rui Luo, Minne Li, Ming Zhou, Weinan Zhang, Jun Wang
ICML 2018
Mean Field RLMulti-Agent RLReinforcement Learning

Service

Area Chair · Associate Editor · Program Chair

Area Chair
  • NeurIPS CCF-A
  • ICML CCF-A
  • ICLR CCF-A
  • AAAI CCF-A
  • IJCAI CCF-A
  • AAMAS — Senior AC CCF-B
  • IROS CCF-C
Associate Editor
  • Neural Networks (Springer) CCF-B
  • Transactions on Machine Learning Research TMLR
  • Scientific Reports Nature
Program / Publicity Chair
  • World Artificial Intelligence Conference Academic (WAICA) 2026 · Shanghai Publicity Chair
  • Distributed AI Conference (DAI) 2024 · Singapore Program Chair

Experience

USTC · Imperial · UCL · AIG · KCL · PKU

2022 – Now
Assistant Professor (Boya Young Scholar)
Peking University · Institute for AI 北京大学人工智能研究院
Chief Scientist, PKU–PsiBot Joint Laboratory · PI, PAIR-Lab
2021 – 2022
Assistant Professor
King's College London · Department of Informatics 伦敦国王大学
2019 – 2021
Principal Researcher
Huawei U.K. · London Research Centre 华为英国研究院
2020 Best Technology Breakthrough Award (sole awardee)
2015 – 2019
Senior Science Manager
American International Group (AIG) · Science Dept. 美国国际集团
2016 – 2021
Ph.D. · Computer Science
University College London (UCL) 伦敦大学学院
Thesis: Many-Agent Reinforcement Learning · Advisors: Jun Wang & John Shawe-Taylor
2013 – 2014
M.Sc. · Quantitative Biology
Imperial College London 伦敦帝国理工学院
2009 – 2013
B.Eng. · Electronic Engineering & Information Science
University of Science & Technology of China (USTC) 中国科学技术大学
§ Join the Lab

Come work on the hardest problems in safe and trustworthy AGI.

PhD · 2027 PhD admissions (2027 cycle)
0
Peking University
FULL · no spots this cycle
Multiple
Zhongguancun Academy
Open · accepting applications
Three research directions

LLM Post-Training · Alignment

RLHF / DPO / Safe-RLHF · reward modeling · interpretability · multi-modal & multilingual safety. Connecting alignment theory to practice at scale.

Embodied Intelligence · Dexterous Manipulation · Robot Foundation Models

Sim-to-real policy learning for high-DoF dexterous manipulation; embodied foundation models that act in the physical world. Joint work with PsiBot.

World Models · Physics Foundation Models · Sim-to-Real Alignment

Build world models that capture both physical and social dynamics; align simulators with the real world for downstream policy training. Joint work with Neo Matrix.

PAIR-Lab also welcomes master's students, visiting scholars, undergraduate research interns, and postdocs. If you are fascinated by reinforcement learning, LLM alignment, multi-agent systems, or embodied intelligence — and want to build safe and trustworthy AGI that ships — please read the starter materials above and reach out.