Building Switchboard: Technical Architecture Behind Intelligent AI Routing
Building a platform that intelligently routes prompts across 40+ AI models whilst maintaining sub-second response times requires careful architectural decisions. Here's how we built Switchboard's backend to handle millions of requests whilst ensuring reliability, scalability, and seamless user experience.
Hybrid Architecture: Best of Both Worlds
I think we can all agree that FastAPI is amazing, especially when making the most of its async functionality. This powers the backbone of our backend allowing us to scale to thousands of users without breaking the bank. In terms of deployment, we leverage GCP's Cloud Run to host our Python backend. This lets us scale horizontally, only pay for the resources we use and makes managing our infrastructure really seamless. Many engineering teams opt for a Kubernetes deployment, and whilst that's not to say we would never consider one in the future, at our current stage in our company's lifecycle speed and adaptability are key.
For our database, we have opted for a GCP Cloud SQL Postgres cluster (If you couldn't tell, we quite like GCP). Whilst there are other options that would let us integrate a database without having to worry about scaling in the future (like Firebase Firestore, Spanner, etc.), we reckon that if a scalable Postgres cluster is good enough for Notion, it's good enough for us!
Firebase for Authentication
We chose Firebase Authentication for its reliability and ease of integration, but extended it with a sophisticated backend validation system. When users authenticate through Firebase, they receive JWT tokens that our Python backend validates on every request.
The system supports both authenticated and anonymous users. Anonymous users receive limited access with daily usage quotas, whereas authenticated users will have much higher limits.
API Design: RESTful with Streaming Support
Our API follows RESTful principles with careful attention to real-time requirements. Robustness is a key consideration for us, where we will quickly switch to another model provider if the existing one is taking too long to return a response. This is one of the key benefits of using Switchboard over integrating with LLM providers yourself, as you can work with a really easy-to-use interface or Developer API whilst feeling confident that you'll always get a response.
Smart Routing: The Technical Heart
The routing algorithm is the core technical innovation of Switchboard. We trained our router on pairwise comparisons. Sometimes you'll see two side-by-side responses with voting buttons at the bottom when using Switchboard. Surprise! This is how we improve the quality of our routing. We then train a proprietary model that can analyses incoming prompt that then selects the optimal model for your use case.
Payment Integration: Stripe with Intelligent Proxying
Payment processing follows a three-tier architecture: frontend payment UI, Next.js proxy routes, and backend Stripe integration. Using Stripe allowed us to quickly and securely integrate payments into Switchboard, without needing to handle things like invoices, receipts, etc. ourselves. Stripe of course takes a cut of revenue, but this is a small price to pay for the ease of integration.
Next.js API routes handle checkout session creation and customer portal access, validating authentication before proxying requests to our Python backend. The backend manages subscription states, trial periods, and usage quotas, automatically adjusting user capabilities based on subscription status.
Multi-Provider Integration: Unified Interface
Integrating 40+ models from different providers - OpenAI, Anthropic, Google, Meta, Mistral, and others - requires careful abstraction and error handling. This allows us to quickly add new models as they become available. We strive for near-100% text coverage of our backend code, including running frequent integration tests against all providers in live mode (yes this can be expensive if done for every commit or PR for example, so we try to find a good balance).
Each provider integration includes comprehensive retry logic, rate limiting, and failover mechanisms. When a primary model is unavailable, our system automatically selects the next best model as recommended by the Smart Router. If all goes well, the user is never aware of this by design.
Scalability Considerations
Switchboard's architecture scales horizontally across multiple dimensions. The Python backend runs on auto-scaling infrastructure that responds to traffic patterns, whilst our Next.js frontend leverages Vercel's edge network for global performance.
- Stateless Design: All services designed for horizontal scaling without session affinity
- Database Optimisation: Efficient indexing and query patterns for chat history and user data
- Monitoring & Observability: Comprehensive logging and metrics collection for performance optimisation
Development and Deployment
Our development workflow allows for rapid iteration while maintaining production stability. The frontend deploys through Vercel's continuous deployment, whilst the Python backend uses containerised images deployed using GitHub Actions. This is pretty standard stuff, but it works! One consideration is the storage space we consume as Deep Learning Docker images can get quite large (especially when working with GPU enabled images). Whilst you could argue that other areas are more expensive (e.g., LLM inference), we will always keep our "startup" hat on and look for ways to reduce costs.
Looking Forward: Technical Evolution
Our technical roadmap focuses on three key areas: improving routing intelligence through R&D, improving performance by using better caching and optimisation, and creating an amazing user experience for our customers (which includes careful architectural design considerations).
So far, building Switchboard has been amazing. It's required solving complex distributed systems challenges and ensuring simplicity for end users. The result is a platform that maximises utility for the end user, without sacrificing performance or ease of use.