Projects
You can find most of my work on 🤗 Hugging Face Hub. As a full-stack foundational AI researcher, my work covers
Pre Training
-
My impementations for DeepSeek Multi-Head Latent Attention, and DeepSeek MoE
-
My tutorial on Implementing Transformer from Scratch: A Step-by-Step Guide
Post Training
-
My implementation for ReTool: Reinforcement Learning for Strategic Tool Use in LLMs
Optimizer
- My tutorial on Understanding the Muon Optimizer: Theory and Implementation
Distributed Training
- incoming