AI breaks the human records in the Kissing Number Problem
PKU mathematicians used AI and reinforcement learning to explore the kissing number problem, achieving breakthroughs in higher dimensions.
YY Dr. Yaodong Yang is an Assistant Professor (Boya Young Scholar) at the Institute for Artificial Intelligence, Peking University and Chief Scientist of the PKU–PsiBot Joint Laboratory. His research focuses on experience learning and alignment of AI / Embodied agents, aiming to advance the trustworthy deployment and real-world alignment of large models, spanning the areas of reinforcement learning, AI alignment, and embodied intelligence.
He has published over 200 papers in leading journals and conferences, including Nature Machine Intelligence, Cell Matter, Artificial Intelligence Journal, and IEEE TPAMI, with more than 16,000 Google Scholar citations. Since 2022, he has been ranked as the top scholar in AI & ML at Peking University according to CSRankings.
Dr. Yang has received numerous honors, including the ACL 2025 Best Paper Award, UKRI 2026 Best Paper Award in AI, ICCV 2023 Best Paper Finalist, CoRL 2020 Best System Paper Award, and the AAMAS 2021 Blue Sky Idea Award.
He was named to the MIT Technology Review "AI 100 Young Innovators", the 2025 Forbes China Technology & Innovation Innovative Leader list, received the WAIC 2022 "Yunfan Star Award", and the ACM SIGAI China Rising Star Award. His work has been featured by CCTV, People's Daily, Xinhua News, the National Natural Science Foundation of China (NSFC), and MIT Technology Review.
He serves as an Area Chair for major conferences including ICML, ICLR, NeurIPS, AAAI, IJCAI, AAMAS, and IROS, and as an Associate Editor for Scientific Reports, Transactions on Machine Learning Research, and Neural Networks.
Previously, Dr. Yang was an Assistant Professor at King's College London, a Principal Researcher at Huawei Research U.K., and a Senior Manager at AIG. He received his B.Sc. from the University of Science and Technology of China, M.Sc. from Imperial College London, and Ph.D. from University College London, where he was the university's sole nominee for the ACM SIGAI Doctoral Dissertation Award.
Headlines · recent updates
PKU mathematicians used AI and reinforcement learning to explore the kissing number problem, achieving breakthroughs in higher dimensions.
Joint work with PKU–PsiBot Lab. A generalist world-action model for embodied agents, outperforming prior SOTA on spatial reasoning benchmarks.
The paper shows that post-aligned language models tend to revert to their pre-training distributions — a theoretical "elasticity" result with implications for RLHF and safety.
A comprehensive ICML tutorial covering RLHF, DPO, safe alignment, preference learning and super-alignment — delivered to a virtual audience of thousands.
A cross-disciplinary work applying LLMs to steer autonomous experimental synthesis of carbon nanotubes, featured in Cell Press's flagship materials journal Matter.
The first multi-agent RL paper led by a Chinese team on a Nature sister journal. Scalable method for controlling 1000+ networked agents with real-world deployments.
Invited talk "Can LLMs be aligned?" at CNCC 2024.
Featured on CCTV「焦点访谈」 — national TV report on AI Safety.
Released the AI Alignment Survey.
TorchOpt officially joined the PyTorch Ecosystem.
NeurIPS 2022 MyoChallenge — 1st place (1 / 340 teams).
TorchOpt and Bi-DexHands open-sourced.
Five directions · methods, benchmarks, and open-source systems
RLHF, preference learning, safe alignment, red-teaming and interpretability. Principled methods and open benchmarks — BeaverTails, PKU-SafeRLHF, Stream Aligner, Libra-Leaderboard — to make LLMs robustly helpful and harmless.
Dexterous manipulation, vision-language-action models, and sim-to-real. From Bi-DexHands and ClutterDexGrasp to DexGraspVLA and Safe VLA — pursuing human-level generalist robotic agents.
Cooperative and competitive MARL, policy gradient theory, Nash equilibria. HARL, MAT, MARLlib, MALib — algorithms that scale to hundreds of agents.
LLM-based agents for macroeconomic modelling, social value orientation, negotiation and consensus. World models unifying physical and social dynamics.
RL and LLMs applied to mathematics, medicine, physics, materials (carbon-nanotube synthesis), and operations — featured in Cell iScience, Matter, and National Science Review.
National coverage · CCTV · Xinhua · NSFC · MIT Tech Review
Best papers · talent programs · academic honors · competitions
Efficient and Scalable Reinforcement Learning for Large-Scale Network Control · Nature Machine Intelligence
Language Models Resist Alignment: Evidence From Data Compression
UniDexGrasp: Universal Robotic Dexterous Grasping via Learning Diverse Proposal Generation and Goal-Conditioned Policy
Diverse Auto-Curriculum is Critical for Successful Real-World Multiagent Learning Systems
SMARTS: An Open-Source Scalable Multi-Agent RL Training School for Autonomous Driving
NSFC Excellent Young Scientist
Ministry of Human Resources — 30 nationwide
CAAI — 6 selected nationally
Global Top 2% career-impact ranking
MIT Technology Review · "AI 100 Young Innovators"
Forbes China · Innovation & Tech Leaders
ACM SIGAI China · 3 awardees nationwide
WAIC · 10 awardees nationwide
Wu Wenjun AI S&T Award · 2nd Prize — Knowledge-Enhanced Trustworthy Multimodal Interaction
CMSA · 1st Prize for Technological Invention — BeiDou + AI for Extreme-Wind Emergency Navigation
Physiological dexterity manipulation · 1 / 340 teams
Digital China Innovation Contest · AI Track · National 1st Prize
Highest PKU student honors · Apple & Tencent fellowships · NSFC grants
For the course "Foundations and Alignment of Large Language Models" (《大语言模型基础与对齐》).
2025 Digital China Innovation Competition — AI Track, First Prize National.
ICBC Teaching Award · PKU · 2025
Yuanpei College · Class Advisor & Curriculum Committee · AGI Experimental Class (2022 cohort)
Awarded three years in a row (2023, 2024, 2025) by Peking University.
Representative works · browse by topic below
Area Chair · Associate Editor · Program Chair
USTC · Imperial · UCL · AIG · KCL · PKU
RLHF / DPO / Safe-RLHF · reward modeling · interpretability · multi-modal & multilingual safety. Connecting alignment theory to practice at scale.
Sim-to-real policy learning for high-DoF dexterous manipulation; embodied foundation models that act in the physical world. Joint work with PsiBot.
Build world models that capture both physical and social dynamics; align simulators with the real world for downstream policy training. Joint work with Neo Matrix.
PAIR-Lab also welcomes master's students, visiting scholars, undergraduate research interns, and postdocs. If you are fascinated by reinforcement learning, LLM alignment, multi-agent systems, or embodied intelligence — and want to build safe and trustworthy AGI that ships — please read the starter materials above and reach out.