Skip to content
Ruiwen WANG

Ruiwen WANG

PhD Candidate · AI Infra · LLM Pretraining · Hybrid Parallelism

Sorbonne Université × EURECOM × Huawei Paris · Paris, France

About

I am a third-year PhD candidate in Computer Science at Sorbonne Université and EURECOM, working at the intersection of AI systems, parallel computing and performance engineering, jointly advised by Dr. Raja Appuswamy (EURECOM) and Dr. Chong Li (Huawei Paris).

I read Computer Science at Université Paris-Saclay (BSc, 2016–2019), where a machine-learning project class taught by Isabelle Guyon and Zhengying Liu first drew me toward the field, then took my MSc at Sorbonne Université (2019–2022) under Profs. Emmanuel Chailloux and Jacques Malenfant.

I joined Huawei Paris in 2021 as a research engineer in AI infrastructure, working on hybrid parallelism and memory optimisation for the LLM pretraining stack behind the company's foundation-scale models. In 2023 the work evolved into a PhD under the French CIFRE programme; the planners studied in the thesis are integrated into Huawei's MindSpore / D-Rec runtime and validated on Ascend-910 production clusters of 10K+ NPUs. I expect to defend in 2026.

Research focus

Scaling AI is increasingly a systems problem. A frontier model is trained by spreading one computation across thousands of accelerators for days or weeks at a time, and how that computation is partitioned and scheduled across them — the parallelism strategy — largely sets how much of the hardware is actually used, and with it the time, energy and cost of a run.

Today that strategy is mostly found by hand: re-tuned for each model and cluster through profiling and expert intuition, and redone whenever the hardware or model changes. I want to know whether it can instead be reasoned about from first principles:

How do we systematically plan parallelism for foundation-scale training without per-cluster profiling, and without losing portability when the hardware or model changes?

My approach builds closed-form cost models that predict how a strategy will behave before it runs — so a good plan can be found analytically, then carried across hardware and model changes rather than rebuilt each time. The planners that come out of this are described under Systems.

News

2026.01
Presented PRISM at SCA / HPCAsia 2026, Osaka.
2026.01
Co-author on MLLM Pipeline Bubble Modeling, presented at SCA / HPCAsia 2026, Osaka.
2025.11
Presented ManuMatic at IFIP NPC 2025, Nha Trang.
2025.09
Presented BMPipe at IEEE CLUSTER 2025, Edinburgh.
2025.08
H²O received the Best Poster Award at Euro-Par 2025, Dresden.
2025.06
Presented SCOPE at the DP2E-AI 2025 Workshop, Paris.
2025.01
Patent on automatic ML execution-plan generation published as WO 2025/020165 A1 (PCT, Huawei Technologies).

Topics

hybrid parallelismdistributed trainingsymbolic cost modellingAI Infralarge-scale LLMLLM pretrainingpipeline parallelismautomatic planningMixture of ExpertsAscend NPUportabilityILP optimisationtensor parallelismactivation recomputationpipeline bubblesmemory modellingcommunication coststrategy injectionexpert parallelismMFU optimisationtransformer training

Systems

The planners are named for what they do: PRISM (symbolic memory), BMPipe (bubble–memory), H²O (hyper-parameter optimisation), ManuMatic (manual + automatic strategy).Four planners built during my PhD — each turning one research thread into a strategy planner you can run.

Three of these — PRISM, BMPipe and H²O — are combined in a live interactive demo: SAPP Search Studio →

PRISM

SCA / HPCAsia 2026

Profiling-free symbolic memory-driven strategy planner. Predicts memory under arbitrary recomputation policies via closed-form per-component expressions; the communication side is modelled with collective-volume and topology-aware templates.

92–96% memory-prediction accuracy (median error ≈ 7%) across DeepSeek3 / Llama2-3 / Mistral / TextHawk / Llama-MoE on Ascend-910 up to 1,024 NPUs; up to 1.43× MFU speedup over Megatron-LM.

BMPipe

IEEE CLUSTER 2025

Bubble–memory co-optimisation strategy planner. Formalises pipeline idle time into a tri-category taxonomy — recomputation, imbalance, preparation bubbles — and solves a joint pipeline–memory ILP for very-large DNN training.

1.36× speedup over Megatron-Even on a 10K+ NPU Ascend-910 cluster; up to 1.70× under 1F1B-Interleave; the ILP solves in under 200 ms at production scale.

H²O

Euro-Par 2025Best Poster

Holistic hyper-parameter optimisation. A two-level optimiser searches the macro tuple (DP, TP, PP, OP, micro-batch) with a fast symbolic Delta cost model; the inner level refines layer-to-stage assignment and per-layer recomputation via ILP — a joint solver spanning all macro degrees and stage assignment simultaneously.

+36.7% speedup over the D-Rec baseline on 128-device clusters; 35.7% MFU on DeepSeek-141B — without any accelerator-time profiling.

ManuMatic

IFIP NPC 2025

Strategy injection for robust automatic hybrid parallelism. A structured interface lets users pin a small set of performance-critical operators while the planner derives a globally consistent strategy for the rest of the graph; built atop the D-Rec planner inside MindSpore, and reduces exactly to D-Rec when no constraints are provided.

Up to 2.24× over D-Rec on Mixtral-8×7B with expert parallelism (8 NPUs); 2.04× on Llama3-8B at 8K context; 1.45× / 1.30× on Qwen2.5-72B at 32K context (64 NPUs, combined with BMPipe).

Demo

An interactive integration of three planners from the Systems section — PRISM (memory cost surface), BMPipe (pipeline-bubble ILP) and H²O (two-level search) — running as a single browser-based research tool.

SAPP Search Studio

Unified ND + PPB strategy search for large-scale LLM training

sapp.ruiwen.wang

Open live demo ↗

What you'll see

Inside the box

The workspace opens empty. Click Launch search study to run a real solver pass and populate the timeline and candidate table — the demo executes actual workloads server-side.

Publications

Ruiwen Wang, Philippe Fang, Chong Li, Thibaut Tachon, Raja Appuswamy.

PRISM: Profiling-Free Symbolic Memory-Driven Strategy Planner for Large DNN Model Training.

SCA / HPCAsia 2026, Osaka, Japan · Jan 26–29, 2026

Zhengdao Yu, Ruiwen Wang, Nelson Lossing, Chong Li.

MLLM Pipeline Bubble Modeling for Large-Scale Training.

SCA / HPCAsia 2026, Osaka, Japan · Jan 26–29, 2026

Xinzhang Liu, Chao Wang, Zhihua Yang, Zhuo Jiang, …, Ruiwen Wang, et al.

Training Report of TeleChat3-MoE.

arXiv preprint · 2025

Ruiwen Wang, Chong Li, Hongxing Wang, Raja Appuswamy, Yujie Yuan.

ManuMatic: Strategy Injection for Robust Automatic Hybrid Parallelism in Distributed DNN Training.

22nd IFIP NPC 2025, Nha Trang, Vietnam · Nov 14–16, 2025

Ruiwen Wang, Chong Li, Thibaut Tachon, Raja Appuswamy, Teng Su.

BMPipe: Bubble-Memory Co-optimization Strategy Planner for Very-Large DNN Training.

27th IEEE CLUSTER 2025, Edinburgh, UK · Sep 2–5, 2025

Ruiwen Wang, Chong Li, Raja Appuswamy, Yujie Yuan.

H²O: Holistic Hyper-Parameter Optimization for Large-Scale Deep Neural Network Training.

31st Euro-Par 2025, Dresden, Germany · Aug 25–29, 2025 · Best Poster Award

Ruiwen Wang, Chong Li, Thibaut Tachon, Raja Appuswamy.

SCOPE — Symbolic Computation-Memory Optimization for Pipeline Efficiency in Ultra-Scale DNN Training.

DP2E-AI 2025 Workshop, Paris, France · Jun 2–6, 2025

Patents

Chong Li, Pierre Leca, Thibaut Tachon, Ruiwen Wang, Haoran Wang.

Devices and methods for generating execution plans for a machine learning model.

WO 2025/020165 A1 · PCT/CN2023/109533 · Huawei Technologies · filed Jul 27, 2023 · published Jan 30, 2025

People

The people behind this work — advisors, mentors, co-authors and friends.

PhD supervisors

MSc advisors

Undergraduate mentors

Co-authors

Friend

Experience & Education

Experience

Education