Ruiwen WANG — AI Infra & LLM Pretraining

About

I am a third-year PhD candidate in Computer Science at Sorbonne Université and EURECOM, working at the intersection of AI systems, parallel computing and performance engineering, jointly advised by Dr. Raja Appuswamy (EURECOM) and Dr. Chong Li (Huawei Paris).

I read Computer Science at Université Paris-Saclay (BSc, 2016–2019), where a machine-learning project class taught by Isabelle Guyon and Zhengying Liu first drew me toward the field, then took my MSc at Sorbonne Université (2019–2022) under Profs. Emmanuel Chailloux and Jacques Malenfant.

I joined Huawei Paris in 2021 as a research engineer in AI infrastructure, working on hybrid parallelism and memory optimisation for the LLM pretraining stack behind the company's foundation-scale models. In 2023 the work evolved into a PhD under the French CIFRE programme; the planners studied in the thesis are integrated into Huawei's MindSpore / D-Rec runtime and validated on Ascend-910 production clusters of 10K+ NPUs. I expect to defend in 2026.

Research focus

Scaling AI is increasingly a systems problem. A frontier model is trained by spreading one computation across thousands of accelerators for days or weeks at a time, and how that computation is partitioned and scheduled across them — the parallelism strategy — largely sets how much of the hardware is actually used, and with it the time, energy and cost of a run.

Today that strategy is mostly found by hand: re-tuned for each model and cluster through profiling and expert intuition, and redone whenever the hardware or model changes. I want to know whether it can instead be reasoned about from first principles:

How do we systematically plan parallelism for foundation-scale training without per-cluster profiling, and without losing portability when the hardware or model changes?

My approach builds closed-form cost models that predict how a strategy will behave before it runs — so a good plan can be found analytically, then carried across hardware and model changes rather than rebuilt each time. The planners that come out of this are described under Systems.

News

2026.01

Presented PRISM at SCA / HPCAsia 2026, Osaka.

2026.01

Co-author on MLLM Pipeline Bubble Modeling, presented at SCA / HPCAsia 2026, Osaka.

2025.11

Presented ManuMatic at IFIP NPC 2025, Nha Trang.

2025.09

Presented BMPipe at IEEE CLUSTER 2025, Edinburgh.

2025.08

H²O received the Best Poster Award at Euro-Par 2025, Dresden.

2025.06

Presented SCOPE at the DP2E-AI 2025 Workshop, Paris.

2025.01

Patent on automatic ML execution-plan generation published as WO 2025/020165 A1 (PCT, Huawei Technologies).

Topics

hybrid parallelismdistributed trainingsymbolic cost modellingAI Infralarge-scale LLMLLM pretrainingpipeline parallelismautomatic planningMixture of ExpertsAscend NPUportabilityILP optimisationtensor parallelismactivation recomputationpipeline bubblesmemory modellingcommunication coststrategy injectionexpert parallelismMFU optimisationtransformer training

Systems

The planners are named for what they do: PRISM (symbolic memory), BMPipe (bubble–memory), H²O (hyper-parameter optimisation), ManuMatic (manual + automatic strategy).Four planners built during my PhD — each turning one research thread into a strategy planner you can run.

Three of these — PRISM, BMPipe and H²O — are combined in a live interactive demo: SAPP Search Studio →

PRISM

SCA / HPCAsia 2026

Profiling-free symbolic memory-driven strategy planner. Predicts memory under arbitrary recomputation policies via closed-form per-component expressions; the communication side is modelled with collective-volume and topology-aware templates.

92–96% memory-prediction accuracy (median error ≈ 7%) across DeepSeek3 / Llama2-3 / Mistral / TextHawk / Llama-MoE on Ascend-910 up to 1,024 NPUs; up to 1.43× MFU speedup over Megatron-LM.

→ SAPP demo

BMPipe

IEEE CLUSTER 2025

Bubble–memory co-optimisation strategy planner. Formalises pipeline idle time into a tri-category taxonomy — recomputation, imbalance, preparation bubbles — and solves a joint pipeline–memory ILP for very-large DNN training.

1.36× speedup over Megatron-Even on a 10K+ NPU Ascend-910 cluster; up to 1.70× under 1F1B-Interleave; the ILP solves in under 200 ms at production scale.

→ SAPP demo

H²O

Euro-Par 2025Best Poster

Holistic hyper-parameter optimisation. A two-level optimiser searches the macro tuple (DP, TP, PP, OP, micro-batch) with a fast symbolic Delta cost model; the inner level refines layer-to-stage assignment and per-layer recomputation via ILP — a joint solver spanning all macro degrees and stage assignment simultaneously.

+36.7% speedup over the D-Rec baseline on 128-device clusters; 35.7% MFU on DeepSeek-141B — without any accelerator-time profiling.

→ SAPP demo

ManuMatic

IFIP NPC 2025

Strategy injection for robust automatic hybrid parallelism. A structured interface lets users pin a small set of performance-critical operators while the planner derives a globally consistent strategy for the rest of the graph; built atop the D-Rec planner inside MindSpore, and reduces exactly to D-Rec when no constraints are provided.

Up to 2.24× over D-Rec on Mixtral-8×7B with expert parallelism (8 NPUs); 2.04× on Llama3-8B at 8K context; 1.45× / 1.30× on Qwen2.5-72B at 32K context (64 NPUs, combined with BMPipe).

Demo

An interactive integration of three planners from the Systems section — PRISM (memory cost surface), BMPipe (pipeline-bubble ILP) and H²O (two-level search) — running as a single browser-based research tool.

SAPP Search Studio

Unified ND + PPB strategy search for large-scale LLM training

sapp.ruiwen.wang

Open live demo ↗

What you'll see

Configure a strategy search: framework and device, top-K candidates, PPB time and memory budgets, search dimensions.
Launch the inner PPB solver — BMPipe's bubble–memory ILP — under tight memory constraints.
Inspect the ranked candidate table: predicted end time, ND peak memory, PPB solver peak, perf score, feasibility.
Generate per-candidate pipeline timeline visualisations and explore strategies layer by layer.

Inside the box

Memory cost surface — PRISM's profiling-free symbolic model evaluates memory and communication for each candidate without running the workload.
Pipeline-bubble solver — BMPipe's joint pipeline–memory ILP segments layers under tight memory budgets and selects per-layer recomputation.
Top-K refinement — H²O's two-level decomposition: a fast macro-degree search yields top-K candidates, refined by the inner solver.

The workspace opens empty. Click Launch search study to run a real solver pass and populate the timeline and candidate table — the demo executes actual workloads server-side.

Publications

Ruiwen Wang, Philippe Fang, Chong Li, Thibaut Tachon, Raja Appuswamy.

PRISM: Profiling-Free Symbolic Memory-Driven Strategy Planner for Large DNN Model Training.

SCA / HPCAsia 2026, Osaka, Japan · Jan 26–29, 2026

Zhengdao Yu, Ruiwen Wang, Nelson Lossing, Chong Li.

MLLM Pipeline Bubble Modeling for Large-Scale Training.

SCA / HPCAsia 2026, Osaka, Japan · Jan 26–29, 2026

Xinzhang Liu, Chao Wang, Zhihua Yang, Zhuo Jiang, …, Ruiwen Wang, et al.

Training Report of TeleChat3-MoE.

arXiv preprint · 2025

Ruiwen Wang, Chong Li, Hongxing Wang, Raja Appuswamy, Yujie Yuan.

ManuMatic: Strategy Injection for Robust Automatic Hybrid Parallelism in Distributed DNN Training.

22nd IFIP NPC 2025, Nha Trang, Vietnam · Nov 14–16, 2025

Ruiwen Wang, Chong Li, Thibaut Tachon, Raja Appuswamy, Teng Su.

BMPipe: Bubble-Memory Co-optimization Strategy Planner for Very-Large DNN Training.

27th IEEE CLUSTER 2025, Edinburgh, UK · Sep 2–5, 2025

Ruiwen Wang, Chong Li, Raja Appuswamy, Yujie Yuan.

H²O: Holistic Hyper-Parameter Optimization for Large-Scale Deep Neural Network Training.

31st Euro-Par 2025, Dresden, Germany · Aug 25–29, 2025 · Best Poster Award

Ruiwen Wang, Chong Li, Thibaut Tachon, Raja Appuswamy.

SCOPE — Symbolic Computation-Memory Optimization for Pipeline Efficiency in Ultra-Scale DNN Training.

DP2E-AI 2025 Workshop, Paris, France · Jun 2–6, 2025

Patents

Chong Li, Pierre Leca, Thibaut Tachon, Ruiwen Wang, Haoran Wang.

Devices and methods for generating execution plans for a machine learning model.

WO 2025/020165 A1 · PCT/CN2023/109533 · Huawei Technologies · filed Jul 27, 2023 · published Jan 30, 2025

People

The people behind this work — advisors, mentors, co-authors and friends.

PhD supervisors

Dr. Raja Appuswamy — Assistant Professor (HDR), EURECOM · director
Dr. Chong Li — Principal Researcher, Huawei Paris · co-director

MSc advisors

Prof. Emmanuel Chailloux — Professor, Sorbonne Université / LIP6
Prof. Jacques Malenfant — Professor, Sorbonne Université / LIP6

Undergraduate mentors

Prof. Isabelle Guyon — Research Director, Google DeepMind · Chaired Professor, Université Paris-Saclay (LISN)
Dr. Viviane Pons — Associate Professor, Université Paris-Saclay (LISN)
Dr. Zhengying Liu — Researcher, Moonshot AI · PhD, Université Paris-Saclay (LRI)

Co-authors

Thibaut Tachon — Research Engineer, Huawei Paris · PRISM · BMPipe · H²O · SCOPE
Philippe Fang — Research Engineer, Huawei Paris · PRISM
Zhengdao Yu — PhD candidate (CIFRE), Huawei Paris × Sorbonne Université · MLLM Pipeline Bubble
Nelson Lossing — Research Engineer (HPC), Huawei Paris · MLLM Pipeline Bubble

Friend

Quentin Petit — Postdoctoral Researcher, Mines Paris – PSL · HPC / AI · formerly Huawei Paris

Experience & Education

Experience

2021–Huawei Paris — Research Engineer · HPC and AI system optimisations for large-scale LLM training and inference, on Ascend-910 production clusters of 10K+ NPUs
2023–Sorbonne Université × EURECOM — CIFRE Joint PhD Programme · Doctoral research in HPC / MLSys under joint academic and industrial supervision

Education

2023–26PhD in Computer Science — Sorbonne Université · EURECOM · with Huawei Paris · advised by Raja Appuswamy & Chong Li · expected
Thesis: Systematic and Portable Optimisation of Hybrid Parallelism for Large-Scale Distributed Training
2019–22MSc in Computer Science — Sorbonne Université, Paris · advised by Emmanuel Chailloux & Jacques Malenfant
2016–19BSc in Computer Science — Université Paris-Saclay · mentored by Isabelle Guyon, Viviane Pons & Zhengying Liu