Blog
Deep-dive technical articles on Medium, implementing LLM inference primitives from scratch with full math derivations.
Demystifying GPTQ: From Lagrange Multipliers to Vectorized PyTorch
Derives the GPTQ weight-quantization update from scratch with Lagrange multipliers, shadow-runs a 2D example by hand, and translates it into vectorized PyTorch. Covers the inverse-Hessian compensation that lets 4-bit LLMs keep their perplexity.
Minimal LLM Inference: Continuous Batching
Implements the scheduling loop, iteration-level batching, and KV-cache eviction policy that power high-throughput LLM serving. Explains why sequential serving leaves 70%+ of GPU compute idle.
Speculative Decoding: The Clever Trick Making LLMs 2x Faster
Full derivation of the draft-verify acceptance probability, gamma sweep analysis for optimal draft length, and wall-clock benchmarks comparing DistilGPT2-drafted vs. standard GPT-2 decoding.
Accelerating Transformer Inference with KV Caching
Derives exact VRAM consumption for KV cache given model width, heads, and sequence length. Walks through a from-scratch PyTorch implementation showing precisely where and why memory explodes with context length.
LoRA: Building on Fundamental Principles for Low-Rank Adaptation
Implements LoRA rank decomposition without any PEFT library, covering the math behind low-rank approximations and demonstrating significant parameter reduction with minimal quality loss.
Reward Model Training for RLHF
Trains a reward model from paired preference data for an RLHF pipeline — covering loss functions, Bradley-Terry model math, and training stability tricks for robust preference learning.
Bristol-Myers Squibb Molecular Translation — Part 1: Introduction and EDA
Overview of the Kaggle molecular translation challenge, dataset exploration, and analysis of IUPAC name distributions. Sets up the problem framing for the deep learning approach.
Bristol-Myers Squibb Molecular Translation — Part 2: Deep Learning Modelling with LSTM
Details the EfficientNet + LSTM + Bahdanau Attention architecture, TPU training strategy, Teacher Forcing, beam search, and final results achieving Levenshtein distance 8.9 (top 500).