Sudhir Pol

Machine Learning Engineer

Hi, I’m Sudhir! 👋

I am a Machine Learning Engineer specializing in LLM inference optimization, Retrieval-Augmented Generation (RAG), and Agentic AI. I am currently a Research Assistant and MS in Data Science candidate at Indiana University Bloomington (graduating May 2026), seeking MLE and Data Scientist roles.

Most recently, I interned as a Machine Learning Engineer at Adobe (San Jose, CA) where I built an LLM-as-Judge evaluation framework benchmarking frontier models for multimodal chart quality assessment and a LangGraph agent with MCP integration for automated online evaluation feedback. At S&P Global, I deployed fine-tuned models on NVIDIA Triton with MLflow/DVC pipelines. At American Express, I built production classifiers and demand prediction models at enterprise scale.

I write an LLM inference series on Medium covering KV caching, speculative decoding, continuous batching, and LoRA from raw PyTorch — every post includes full mathematical derivations. My open-source work is on GitHub.

Currently, I’m working through CUDA kernels for LLM inference in CUDA-AI-Kernels — building up from thread/memory hierarchy, coalescing, and reduction patterns toward deep-learning kernels like softmax and a worked-out flash attention, with notes on register tiling, vectorized (float4) loads, and Nsight profiling.

Feel free to reach out!

Research Interests

LLM Inference Optimization
Retrieval-Augmented Generation (RAG)
Agentic AI & Multi-step Reasoning
Machine Learning Systems