About
I am a third-year PhD candidate in Computer Science at Sorbonne Université and EURECOM, working at the intersection of AI systems, parallel computing and performance engineering, jointly advised by Dr. Raja Appuswamy (EURECOM) and Dr. Chong Li (Huawei Paris).
I read Computer Science at Université Paris-Saclay (BSc, 2016–2019), where a machine-learning project class taught by Isabelle Guyon and Zhengying Liu first drew me toward the field, then took my MSc at Sorbonne Université (2019–2022) under Profs. Emmanuel Chailloux and Jacques Malenfant.
I joined Huawei Paris in 2021 as a research engineer in AI infrastructure, working on hybrid parallelism and memory optimisation for the LLM pretraining stack behind the company's foundation-scale models. In 2023 the work evolved into a PhD under the French CIFRE programme; the planners studied in the thesis are integrated into Huawei's MindSpore / D-Rec runtime and validated on Ascend-910 production clusters of 10K+ NPUs. I expect to defend in 2026.
Research focus
Scaling AI is increasingly a systems problem. A frontier model is trained by spreading one computation across thousands of accelerators for days or weeks at a time, and how that computation is partitioned and scheduled across them — the parallelism strategy — largely sets how much of the hardware is actually used, and with it the time, energy and cost of a run.
Today that strategy is mostly found by hand: re-tuned for each model and cluster through profiling and expert intuition, and redone whenever the hardware or model changes. I want to know whether it can instead be reasoned about from first principles:
How do we systematically plan parallelism for foundation-scale training without per-cluster profiling, and without losing portability when the hardware or model changes?
My approach builds closed-form cost models that predict how a strategy will behave before it runs — so a good plan can be found analytically, then carried across hardware and model changes rather than rebuilt each time. The planners that come out of this are described under Systems.
News
Topics
Systems
The planners are named for what they do: PRISM (symbolic memory), BMPipe (bubble–memory), H²O (hyper-parameter optimisation), ManuMatic (manual + automatic strategy).Four planners built during my PhD — each turning one research thread into a strategy planner you can run.
Three of these — PRISM, BMPipe and H²O — are combined in a live interactive demo: SAPP Search Studio →
PRISM
SCA / HPCAsia 2026Profiling-free symbolic memory-driven strategy planner. Predicts memory under arbitrary recomputation policies via closed-form per-component expressions; the communication side is modelled with collective-volume and topology-aware templates.
92–96% memory-prediction accuracy (median error ≈ 7%) across DeepSeek3 / Llama2-3 / Mistral / TextHawk / Llama-MoE on Ascend-910 up to 1,024 NPUs; up to 1.43× MFU speedup over Megatron-LM.
BMPipe
IEEE CLUSTER 2025Bubble–memory co-optimisation strategy planner. Formalises pipeline idle time into a tri-category taxonomy — recomputation, imbalance, preparation bubbles — and solves a joint pipeline–memory ILP for very-large DNN training.
1.36× speedup over Megatron-Even on a 10K+ NPU Ascend-910 cluster; up to 1.70× under 1F1B-Interleave; the ILP solves in under 200 ms at production scale.
H²O
Euro-Par 2025Best PosterHolistic hyper-parameter optimisation. A two-level optimiser searches the macro tuple (DP, TP, PP, OP, micro-batch) with a fast symbolic Delta cost model; the inner level refines layer-to-stage assignment and per-layer recomputation via ILP — a joint solver spanning all macro degrees and stage assignment simultaneously.
+36.7% speedup over the D-Rec baseline on 128-device clusters; 35.7% MFU on DeepSeek-141B — without any accelerator-time profiling.
ManuMatic
IFIP NPC 2025Strategy injection for robust automatic hybrid parallelism. A structured interface lets users pin a small set of performance-critical operators while the planner derives a globally consistent strategy for the rest of the graph; built atop the D-Rec planner inside MindSpore, and reduces exactly to D-Rec when no constraints are provided.
Up to 2.24× over D-Rec on Mixtral-8×7B with expert parallelism (8 NPUs); 2.04× on Llama3-8B at 8K context; 1.45× / 1.30× on Qwen2.5-72B at 32K context (64 NPUs, combined with BMPipe).
Demo
An interactive integration of three planners from the Systems section — PRISM (memory cost surface), BMPipe (pipeline-bubble ILP) and H²O (two-level search) — running as a single browser-based research tool.
SAPP Search Studio
Unified ND + PPB strategy search for large-scale LLM training
sapp.ruiwen.wang
Open live demo ↗What you'll see
- Configure a strategy search: framework and device, top-K candidates, PPB time and memory budgets, search dimensions.
- Launch the inner PPB solver — BMPipe's bubble–memory ILP — under tight memory constraints.
- Inspect the ranked candidate table: predicted end time, ND peak memory, PPB solver peak, perf score, feasibility.
- Generate per-candidate pipeline timeline visualisations and explore strategies layer by layer.
Inside the box
- Memory cost surface — PRISM's profiling-free symbolic model evaluates memory and communication for each candidate without running the workload.
- Pipeline-bubble solver — BMPipe's joint pipeline–memory ILP segments layers under tight memory budgets and selects per-layer recomputation.
- Top-K refinement — H²O's two-level decomposition: a fast macro-degree search yields top-K candidates, refined by the inner solver.
The workspace opens empty. Click Launch search study to run a real solver pass and populate the timeline and candidate table — the demo executes actual workloads server-side.
Publications
PRISM: Profiling-Free Symbolic Memory-Driven Strategy Planner for Large DNN Model Training.
SCA / HPCAsia 2026, Osaka, Japan · Jan 26–29, 2026
MLLM Pipeline Bubble Modeling for Large-Scale Training.
SCA / HPCAsia 2026, Osaka, Japan · Jan 26–29, 2026
Training Report of TeleChat3-MoE.
arXiv preprint · 2025
ManuMatic: Strategy Injection for Robust Automatic Hybrid Parallelism in Distributed DNN Training.
22nd IFIP NPC 2025, Nha Trang, Vietnam · Nov 14–16, 2025
BMPipe: Bubble-Memory Co-optimization Strategy Planner for Very-Large DNN Training.
27th IEEE CLUSTER 2025, Edinburgh, UK · Sep 2–5, 2025
H²O: Holistic Hyper-Parameter Optimization for Large-Scale Deep Neural Network Training.
31st Euro-Par 2025, Dresden, Germany · Aug 25–29, 2025 · Best Poster Award
SCOPE — Symbolic Computation-Memory Optimization for Pipeline Efficiency in Ultra-Scale DNN Training.
DP2E-AI 2025 Workshop, Paris, France · Jun 2–6, 2025
Patents
Devices and methods for generating execution plans for a machine learning model.
WO 2025/020165 A1 · PCT/CN2023/109533 · Huawei Technologies · filed Jul 27, 2023 · published Jan 30, 2025
People
The people behind this work — advisors, mentors, co-authors and friends.
PhD supervisors
- Dr. Raja Appuswamy — Assistant Professor (HDR), EURECOM · director
- Dr. Chong Li — Principal Researcher, Huawei Paris · co-director
MSc advisors
- Prof. Emmanuel Chailloux — Professor, Sorbonne Université / LIP6
- Prof. Jacques Malenfant — Professor, Sorbonne Université / LIP6
Undergraduate mentors
- Prof. Isabelle Guyon — Research Director, Google DeepMind · Chaired Professor, Université Paris-Saclay (LISN)
- Dr. Viviane Pons — Associate Professor, Université Paris-Saclay (LISN)
- Dr. Zhengying Liu — Researcher, Moonshot AI · PhD, Université Paris-Saclay (LRI)
Co-authors
- Thibaut Tachon — Research Engineer, Huawei Paris · PRISM · BMPipe · H²O · SCOPE
- Philippe Fang — Research Engineer, Huawei Paris · PRISM
- Zhengdao Yu — PhD candidate (CIFRE), Huawei Paris × Sorbonne Université · MLLM Pipeline Bubble
- Nelson Lossing — Research Engineer (HPC), Huawei Paris · MLLM Pipeline Bubble
Friend
- Quentin Petit — Postdoctoral Researcher, Mines Paris – PSL · HPC / AI · formerly Huawei Paris
Experience & Education
Experience
- 2021–Huawei Paris — Research Engineer · HPC and AI system optimisations for large-scale LLM training and inference, on Ascend-910 production clusters of 10K+ NPUs
- 2023–Sorbonne Université × EURECOM — CIFRE Joint PhD Programme · Doctoral research in HPC / MLSys under joint academic and industrial supervision
Education
- 2023–26PhD in Computer Science — Sorbonne Université · EURECOM · with Huawei Paris · advised by Raja Appuswamy & Chong Li · expected
Thesis: Systematic and Portable Optimisation of Hybrid Parallelism for Large-Scale Distributed Training - 2019–22MSc in Computer Science — Sorbonne Université, Paris · advised by Emmanuel Chailloux & Jacques Malenfant
- 2016–19BSc in Computer Science — Université Paris-Saclay · mentored by Isabelle Guyon, Viviane Pons & Zhengying Liu
