Principal AI Engineer · Architect · Fractional CTO
LangGraph agent graphs · RAG pipelines · WebRTC voice agents · multi-tenant AI backends. Engineering discipline, not vibes-based prompting — LangSmith evals and retrieval-precision metrics before anything ships to production.
Engineering Experience at
From greenfield agentic systems to embedding AI inside an existing stack
Not sure which engagement fits?
Schedule Free 30-Min Discovery CallA production agentic system designed, built, and shipped solo — end-to-end
Yuvan is a live, paying-customer AI product for CBSE Class 10 students. 5 schools are paying design partners on a B2B2C model (school recommends, parent pays). Full agentic architecture built by Arjun:
Per-doubt agent: embed → semantic FAQ-cache lookup → RAG retrieval over NCERT chunks → GPT-4o-mini generation → math verifier → topic auto-tagger → persistence → re-explain loop on detected student confusion.
OpenAI Realtime (STT + TTS + VAD + barge-in). Backend-minted ephemeral tokens. Sub-1.5s p95 first-audio latency. All reasoning, RAG, and verification stays server-side.
Ingestion, chunking, OpenAI embeddings, top-k retrieval, citation-grounded answers. Global semantic FAQ cache (cosine >= 0.92) targeting ~30% hit rate to push per-session OpenAI cost below the unit-economics ceiling.
4 user roles: Student, Parent, School Admin, Super Admin. Supabase Auth + Postgres RLS for tenant isolation. Parental-consent flow (DPDP / NCPCR-aligned). Weekly parent reports and a school engagement dashboard.
Re-checks every numerical claim before the AI speaks it (under 1s overhead). The difference between a demo and a tutor that schools actually trust.
Full agent traces, retrieval-precision metrics, and evals before any change ships to production. Engineering discipline, not vibes-based prompting.
Stack: LangGraph · FastAPI (async) · OpenAI GPT-4o-mini · OpenAI Realtime API · pgvector · Supabase · Next.js 15 · Railway · Vercel · LangSmith
Real recommendations from my LinkedIn profile
What: Benchmarked LangChain router, supervisor, and plan-and-execute patterns on real student conversations.
Outcome: Selected architecture Yuvan ships on. Shipped early prototype to 900+ users — validated demand before building the full product.
Stack: LangChain · OpenAI · Python
What: Prototyped 4 voice stacks — OpenAI Realtime, Whisper + ElevenLabs, Whisper + Coqui, Gemini Live — with latency/cost benchmarks.
Outcome: Data drove the Realtime + WebRTC selection for Yuvan.
Stack: OpenAI Realtime · WebRTC · Python
Challenge: Protect payment systems from CSRF misuse globally.
Solution: Throttling system on DynamoDB + Java.
Result: 19.7% lower security impact; 15.3% reduction in coupon misuse in first month.
Challenge: Monolith could not handle crypto bull-run traffic.
Solution: Migrated to Python/FastAPI microservices; Redis caching (30% latency drop); PgBouncer connection pooling (40% further latency drop).
Result: System held stable through a 16x user surge.
Challenge: High-scale experimentation infrastructure for product managers.
Solution: In-house pipeline on Scala + Cassandra + Kafka processing millions of messages.
Result: 22% faster time-to-insight.
Challenge: Zero-downtime microservice migration for seller promotions.
Solution: Java, Spring Security, PostgreSQL, Docker, Kubernetes on AWS.
Result: 28% fewer system failures; 25% more transaction volume handled.
Agentic AI and LLMs: LangChain, LangGraph (stateful agent graphs), RAG, semantic FAQ caching, function/tool calling, prompt engineering, evals, agent observability (LangSmith), guardrails and verifier patterns, multi-agent orchestration
LLM Providers and Voice: OpenAI GPT-4o / 4o-mini, Realtime API, Embeddings (text-embedding-3), Anthropic Claude, Gemini, OpenAI Whisper, WebRTC voice agents, streaming + interruption handling
Vector and Retrieval: pgvector, embeddings (OpenAI, Cohere, BGE), hybrid search, chunking strategies, cosine-similarity caching, retrieval evals
AI-Native Backend: Python 3.11+, FastAPI (async), Pydantic v2, asyncpg, pytest-asyncio
Other Backend: Java 21, Spring Boot, Spring Security, Hibernate JPA, Scala
Architecture: Microservices, distributed systems, event-driven (Kafka), HLD/LLD, design patterns, SOLID, clean code, REST + OpenAPI, TDD
Data: PostgreSQL, pgvector, Cassandra, DynamoDB, Redis, MongoDB, PgBouncer, Supabase (Auth + RLS)
Cloud and DevOps: AWS Certified Solutions Architect — S3, ECS, Lambda, IAM | Docker, Kubernetes, GitHub Actions, Jenkins, Railway, Vercel, Grafana, ELK
Frontend: Next.js 15 (App Router), TypeScript, React, Tailwind, shadcn/ui, KaTeX
Testing and QA: TDD, JUnit, Mockito, pytest, pytest-asyncio, httpx
AI-Assisted Dev: Claude Code, Cursor, GitHub Copilot, v0.dev
Leadership: Strategic planning, hiring and team growth (15+ engineers, INR 30Cr P&L), mentoring, curriculum design, stakeholder management, agile/scrum, founder-mode product + GTM
Arjun builds production agentic AI systems from the ground up: LangGraph stateful agent graphs, RAG pipelines on pgvector, WebRTC voice agents using OpenAI Realtime API, multi-tenant AI backends with Supabase, and full-stack AI products with Next.js 15. He runs LangSmith evals and retrieval-precision metrics before anything is called production-ready.
Yes. Arjun is available for fractional CTO engagements, senior freelance AI engineering contracts, and full-time Lead/Principal AI Engineer roles globally. Remote-first. Contact via the form or email at thakurarjun247@gmail.com.
10+ years of production engineering: SDE2 at Amazon (payments/security), Senior Software Developer at Agoda (distributed systems with Scala/Cassandra/Kafka), Lead Developer at CoinSwitch Kuber (scaled 600K to 10M users), VP Engineering at Nolan EduTech (INR 30 crore revenue in one year), Senior Engineering Manager at ForaySoft/Wayfair, and Director of Engineering at PhysicsWallah (led 15-engineer team). Pivoted full-time into AI in late 2023, founded Yuvan in 2025.
Arjun runs LangSmith evals, retrieval-precision metrics, and full agent traces before calling anything production-ready. He implements math verifier patterns to validate numerical claims, guardrails to prevent hallucinations, and semantic FAQ caches to manage per-session LLM costs. The same engineering rigor he brought from Amazon and Agoda applies to AI systems.
Book a free 30-minute consultation to discuss your agentic AI challenges.