AI breaks the human records in the Kissing Number Problem
PKU mathematicians used AI and reinforcement learning to explore the kissing number problem, achieving breakthroughs in higher dimensions.
YY Dr. Yaodong Yang is an Assistant Professor (Boya Young Scholar) at the Institute for Artificial Intelligence, Peking University and Chief Scientist of the PKU–PsiBot Joint Laboratory. His research focuses on experience learning and alignment of AI / Embodied agents, aiming to advance the trustworthy deployment and real-world alignment of large models, spanning the areas of reinforcement learning, AI alignment, and embodied intelligence.
He has published over 200 papers in leading journals and conferences, including Nature Machine Intelligence, Cell Matter, Artificial Intelligence Journal, and IEEE TPAMI, with more than 15,000 Google Scholar citations. Since 2022, he has been ranked as the top scholar in AI & Machine Learning at Peking University according to CSRankings.
Dr. Yang has received numerous honors, including the ACL 2025 Best Paper Award, UKRI 2026 Best Paper Award in AI & Robotics, ICCV 2023 Best Paper Initial List, CoRL 2020 Best System Paper Award, and the AAMAS 2021 Blue Sky Idea Award. He was named to the MIT Technology Review "AI 100 Young Innovators", the 2025 Forbes China Technology & Innovation Innovative Leader list, received the WAIC 2022 "Yunfan Star Award", and the ACM SIGAI China Rising Star Award. His work has been featured by CCTV, Xinhua News, the National Natural Science Foundation of China (NSFC), and MIT Technology Review.
He serves as an Area Chair for major conferences including ICML, ICLR, NeurIPS, AAAI, IJCAI, AAMAS, and IROS, and as an Associate Editor for Scientific Reports, Transactions on Machine Learning Research, and Neural Networks.
Previously, Dr. Yang was an Assistant Professor at King's College London, a Principal Researcher at Huawei Research U.K., and a Senior Manager at AIG. He received his B.Sc. from the University of Science and Technology of China, M.Sc. from Imperial College London, and Ph.D. from University College London, where he was the university's sole nominee for the ACM SIGAI Doctoral Dissertation Award.
Headlines · recent updates
PKU mathematicians used AI and reinforcement learning to explore the kissing number problem, achieving breakthroughs in higher dimensions.
Joint work with PKU–PsiBot Lab. A generalist world-action model for embodied agents, outperforming prior SOTA on spatial reasoning benchmarks.
The paper shows that post-aligned language models tend to revert to their pre-training distributions — a theoretical "elasticity" result with implications for RLHF and safety.
A comprehensive ICML tutorial covering RLHF, DPO, safe alignment, preference learning and super-alignment — delivered to a virtual audience of thousands.
A cross-disciplinary work applying LLMs to steer autonomous experimental synthesis of carbon nanotubes, featured in Cell Press's flagship materials journal Matter.
The first multi-agent RL paper led by a Chinese team on a Nature sister journal. Scalable method for controlling 1000+ networked agents with real-world deployments.
7 papers accepted at ACL 2026.
9 papers accepted at AAAI 2026 / ICLR 2026 / AAMAS 2026 / ICRA 2026.
11 papers accepted at NeurIPS 2025 (2 Spotlights).
6 papers accepted at ACL 2025; 2 papers accepted at ICML 2025.
5 papers accepted at ICLR 2025.
5 papers accepted at AAAI 2025; 2 at AAMAS 2025.
Invited talk "Can LLMs be aligned?" at CNCC 2024.
5 papers accepted at NeurIPS 2024.
2 papers accepted at CoRL 2024.
Delivered the VALSE 2024 annual progress report on alignment; 3 papers accepted at ICML 2024.
Featured on CCTV「焦点访谈」 — national TV report on AI Safety.
5 papers accepted at ICLR 2024; 1 at TPAMI.
3 papers accepted at AAAI 2024.
Released the AI Alignment Survey.
Paper on the ICCV 2023 Best Paper Initial List (top 17 / 8260).
6 papers accepted at NeurIPS 2023; 2 at JMLR and TMLR.
TorchOpt officially joined the PyTorch Ecosystem.
4 papers accepted at ICML 2023.
2 papers accepted at ICRA 2023; 1 at ICLR 2023.
1 paper accepted at JAAMAS and 1 at AAMAS 2023.
NeurIPS 2022 MyoChallenge — 1st place (1 / 340 teams).
National Science Review paper on Nash equilibrium complexity; 3 papers accepted at AAAI 2023.
7 papers accepted at NeurIPS 2022.
1 paper accepted at IJCAI 2022.
TorchOpt and Bi-DexHands open-sourced.
2 papers accepted at ICLR 2022.
SMARTS platform released; CoRL 2020 Best System Paper Award.
1 paper accepted at ICML 2020.
1 paper accepted at IJCAI 2020.
1 paper accepted at AAMAS 2020.
Five directions · methods, benchmarks, and open-source systems
RLHF, preference learning, safe alignment, red-teaming and interpretability. Principled methods and open benchmarks — BeaverTails, PKU-SafeRLHF, Stream Aligner, Libra-Leaderboard — to make LLMs robustly helpful and harmless.
Dexterous manipulation, vision-language-action models, and sim-to-real. From Bi-DexHands and ClutterDexGrasp to DexGraspVLA and Safe VLA — pursuing human-level generalist robotic agents.
Cooperative and competitive MARL, policy gradient theory, Nash equilibria. HARL, MAT, MARLlib — algorithms that scale to hundreds of agents.
LLM-based agents for macroeconomic modelling, social value orientation, negotiation and consensus. World models unifying physical and social dynamics.
RL and LLMs applied to medicine, physics, materials (carbon-nanotube synthesis), and operations — featured in Cell iScience, Matter, and National Science Review.
National coverage · CCTV · Xinhua · NSFC · MIT Tech Review
Best papers · talent programs · academic honors · competitions
Efficient and Scalable Reinforcement Learning for Large-Scale Network Control · Nature Machine Intelligence
Language Models Resist Alignment: Evidence From Data Compression
UniDexGrasp: Universal Robotic Dexterous Grasping via Learning Diverse Proposal Generation and Goal-Conditioned Policy
Diverse Auto-Curriculum is Critical for Successful Real-World Multiagent Learning Systems
SMARTS: An Open-Source Scalable Multi-Agent RL Training School for Autonomous Driving
NSFC Excellent Young Scientist (Overseas)
Ministry of Human Resources — 30 nationwide
CAAI — 6 selected nationally
Global Top 2% career-impact ranking
麻省理工科技评论「AI 100 青年先锋」
福布斯中国科创革新力人物
Sole awardee of the year
10 awardees nationwide
吴文俊人工智能科学技术奖 · 科技进步奖二等奖 — 知识增强的可信多模态交互关键技术及应用
China Meteorological Service Association — Meteorological Technology Invention Award, First Prize (project: Navigation Route Planning for Extreme-Wind Meteorological Emergency Rescue Integrating BeiDou and AI).
Physiological dexterity manipulation · 1 / 340 teams
数字中国创新大赛人工智能赛道全国一等奖
Highest PKU student honors · Apple & Tencent fellowships · NSFC grants
For the course "Foundations and Alignment of Large Language Models" (《大语言模型基础与对齐》).
2025 Digital China Innovation Competition — AI Track, First Prize (national).
中国工商银行奖教金 · 北京大学 2025 年度
元培学院"通用人工智能实验班"2022 级班主任 · 教学委员
Awarded three years in a row (2023, 2024, 2025) by Peking University.
Representative works · browse by topic below
Area Chair · Associate Editor · Program Chair
USTC · Imperial · UCL · AIG · KCL · PKU
Open admission cycle — check below before contacting
Sim-to-real policy learning for high-DoF dexterous manipulation; embodied foundation models that act in the physical world. Joint work with PsiBot.
Build world models that capture both physical and social dynamics; align simulators with the real world for downstream policy training. Joint work with Neo Matrix.
RLHF / DPO / Safe-RLHF · reward modeling · interpretability · multi-modal & multilingual safety. Connecting alignment theory to practice at scale.
PAIR-Lab also welcomes master's students, visiting scholars, undergraduate research interns, and postdocs. If you are fascinated by reinforcement learning, LLM alignment, multi-agent systems, or embodied intelligence — and want to build safe and trustworthy AGI that ships — please read the starter materials above and reach out.