Current Focus
Projects
Language Model Pretraining
PretrainingEnd-to-end pretraining of a 416M parameter GPT/Llama-style transformer on FineWeb-Edu (10B tokens). Trained on 8xA100 GPUs using Distributed Data Parallel with custom training loops and memory optimizations.
Llama 3.1 8B Instruction Tuning
Fine-tuningLoRA fine-tuning of Llama 3.1 8B base model for instruction following. Extensive quantization experiments (4-bit, 8-bit, BF16) with IFEval and tinyMMLU benchmarking. Achieved 52% improvement in instruction following (200 → 305/834 on IFEval) with ~$10 of compute.
Legal RAG System
RAG / AgentsProduction-grade agentic RAG system for legal documents. Full pipeline from data sourcing through retrieval, reranking, and agent orchestration. Hybrid search combining dense embeddings (bge-m3) with sparse retrieval (BM25).
Embeddings: bge-m3, gemini-embedding-001
Indexing: ChromaDB (HNSW), Elasticsearch (BM25)
Retrieval: Hybrid search with RRF, convex combination
Agents: Conversational + search agents, planning, self-triage
Training: 560M parameter model fine-tuning
Full code available on request
Transformer from Scratch
FundamentalsClean PyTorch implementation of "Attention Is All You Need" for deep understanding of transformer mechanics. Extended with modern architectural improvements used in current LLMs.
Components: encoder/decoder blocks, multi-head attention, positional encoding, layer normalization
LLM VRAM Calculator
ToolingComprehensive tool for estimating GPU memory requirements for LLM training and inference. Supports dense transformers and MoE architectures with detailed breakdowns of weights, gradients, optimizer states, activations, and KV cache.
Open to Opportunities
Looking to join teams building interesting AI systems.