Blogs

Jan 30, 20265 min read

How I Built Intelligent Routing Without ML Dependencies

The original routing was pure regex with 85% accuracy, <5ms per query, zero ML dependencies. Then I added LLM-as-judge routing and kept the regex as fallback. Both are still running in production.

Read Article
Jan 23, 20265 min read

Running an AI App End-to-End for Free: LLM Sleep & Token Optimizations

Free tiers are not free. They're a negotiation. Here's how I kept Concierge AI running on $0/month by managing Groq's TPM limits, HuggingFace quotas, Supabase connections, and cold starts.

Read Article
Jan 16, 20265 min read

Deploying a Python + Next.js AI App on Vercel Free Tier

Vercel is not a server. Every request spins up an isolated function, runs your code, and dies. Here's how I restructured a FastAPI + Next.js app to fit the 50MB limit and survive cold starts.

Read Article
Jan 9, 20265 min read

Building Production Hybrid Search: BM25 + pgvector in Supabase

Vector search alone missed exact tax terms like 'Form 1040-NR'. I built a hybrid BM25 + pgvector pipeline with dynamic weights, RRF fusion, and temporal re-ranking, all inside a single Supabase SQL function.

Read Article
Jan 2, 20265 min read

Evaluating Your RAG Pipeline with RAGAS, Completely Free

You've built a RAG pipeline. But is it actually good? I used RAGAS to measure retrieval and generation quality across 5 metrics, using only free-tier APIs. Here's exactly how.

Read Article
Dec 19, 20255 min read

Building a Production RAG System That Scores 0.80 (And Runs on Free Tier)

I built an AI tax assistant that knows when it's out of its depth. The system scored 0.8043 on RAGAS, an industry-standard evaluation framework. The entire thing runs on free APIs. Zero compute costs.

Read Article

Get In Touch