Hi, Iām Sudhir! š
I am a Machine Learning Engineer specializing in LLM inference optimization, Retrieval-Augmented Generation (RAG), and Agentic AI. I am currently a Research Assistant and MS in Data Science candidate at Indiana University Bloomington (graduating May 2026), seeking MLE and Data Scientist roles.
Most recently, I interned as a Machine Learning Engineer at Adobe (San Jose, CA) where I built an LLM-as-Judge evaluation framework benchmarking frontier models for multimodal chart quality assessment and a LangGraph agent with MCP integration for automated online evaluation feedback. At S&P Global, I deployed fine-tuned models on NVIDIA Triton with MLflow/DVC pipelines. At American Express, I built production classifiers and demand prediction models at enterprise scale.
I write an LLM inference series on Medium covering KV caching, speculative decoding, continuous batching, and LoRA from raw PyTorch ā every post includes full mathematical derivations. My open-source work is on GitHub.
Currently, Iām working through CUDA kernels for LLM inference in CUDA-AI-Kernels ā building up from thread/memory hierarchy, coalescing, and reduction patterns toward deep-learning kernels like softmax and a worked-out flash attention, with notes on register tiling, vectorized (float4) loads, and Nsight profiling.
Feel free to reach out!
Research Interests
- LLM Inference Optimization
- Retrieval-Augmented Generation (RAG)
- Agentic AI & Multi-step Reasoning
- Machine Learning Systems