Projects
You can find most of my work on 🤗 Hugging Face Hub. As a full-stack foundational AI researcher, my work covers
Pre Training
-
My impementations for DeepSeek Multi-Head Latent Attention, and DeepSeek MoE
-
My tutorial on Implementing Transformer from Scratch: A Step-by-Step Guide
Post Training
-
My implementation for ReTool: Reinforcement Learning for Strategic Tool Use in LLMs
Optimizer
- My tutorial on Understanding the Muon Optimizer: Theory and Implementation
Distributed Training
- My reverse-engineered, annotated re-implementation of Moonshot’s Muon Is Scalable paper Distributed Muon: CPU-Friendly Implementation of a Multi-Node Optimizer