Blogs
How I Built Intelligent Routing Without ML Dependencies
The original routing was pure regex with 85% accuracy, <5ms per query, zero ML dependencies. Then I added LLM-as-judge routing and kept the regex as fallback. Both are still running in production.
Running an AI App End-to-End for Free: LLM Sleep & Token Optimizations
Free tiers are not free. They're a negotiation. Here's how I kept Concierge AI running on $0/month by managing Groq's TPM limits, HuggingFace quotas, Supabase connections, and cold starts.
Deploying a Python + Next.js AI App on Vercel Free Tier
Vercel is not a server. Every request spins up an isolated function, runs your code, and dies. Here's how I restructured a FastAPI + Next.js app to fit the 50MB limit and survive cold starts.
Building Production Hybrid Search: BM25 + pgvector in Supabase
Vector search alone missed exact tax terms like 'Form 1040-NR'. I built a hybrid BM25 + pgvector pipeline with dynamic weights, RRF fusion, and temporal re-ranking, all inside a single Supabase SQL function.
Evaluating Your RAG Pipeline with RAGAS, Completely Free
You've built a RAG pipeline. But is it actually good? I used RAGAS to measure retrieval and generation quality across 5 metrics, using only free-tier APIs. Here's exactly how.
Building a Production RAG System That Scores 0.80 (And Runs on Free Tier)
I built an AI tax assistant that knows when it's out of its depth. The system scored 0.8043 on RAGAS, an industry-standard evaluation framework. The entire thing runs on free APIs. Zero compute costs.