Building Pitch Gym: A Real-Time Voice AI Coach for Startup Founders
- Henil Diwan

- May 22
- 7 min read

The Problem Nobody Talks About
Every founder I've spoken to says the same thing about pitch practice: it's awkward, it's expensive, and there's never enough of it. Y Combinator and Techstars accept fewer than three percent of applicants. Human pitch coaches charge $200–$500 a session. ChatGPT can roleplay an investor, but it lacks the one thing that actually matters — your voice. Pitching is a verbal skill. Tone, pacing, the moment you stumble on a question about unit economics — none of that survives a text chat.
So I built Pitch Gym: a web platform that lets any founder, anywhere, jump into a real-time voice conversation with a simulated investor named Alex. Ten different pitch scenarios. Structured 1–5 scoring. Strengths, weaknesses, and improvement suggestions delivered the second you hang up. The whole thing runs serverless and scales with demand.
This post walks through the architecture, the technical challenges, and the design decisions that made it work.
What It Does
The flow is dead simple from the user's point of view:
Pick a pitch type — Elevator, VC Ready, Customer Discovery, Demo Day, and seven more.
Optionally tag the session to one of your saved startup ideas and carry context from a previous session.
Hit "Join" — a WebRTC call opens to Alex.
Pitch for as long as your credits allow (1 credit = 1 minute, fractional precision).
Hang up, get a structured scorecard.

Under the hood, each pitch type maps to a dedicated Vapi assistant configured with its own system prompt and structured output schema. The "Investor Skeptic Mode" assistant is openly hostile. The "Co-Founder Pitch" assistant probes for vision alignment. Different brains, same platform.
Ten Pitch Types, Ten Rubrics
The big mistake most pitch practice tools make is treating every pitch as the same conversation. An elevator pitch and a Demo Day presentation share almost no DNA — one is sixty seconds of clarity and confidence, the other is ten minutes of storytelling under stage lights. Pitch Gym separates them out into ten dedicated scenarios, and each scenario scores a different set of four metrics aligned to what actually matters in that situation.
# | Pitch Type | Why It Exists | What's Scored |
|---|---|---|---|
1 | Idea Validation | Test if your problem is real and worth solving before building anything. | Problem Clarity, Customer Definition, Evidence, Communication |
2 | Elevator Pitch | Deliver a crisp, compelling 60-second pitch that sticks. | Clarity, Memorability, Market Framing, Confidence |
3 | VC Ready | Full investor-style session covering all pillars of a fundable startup. | Business Model, Traction, Go-to-Market, Pressure Handling |
4 | Product Feedback | Get sharp product critique from an investor's lens. | Product Clarity, Differentiation, UX Quality, Value Prop |
5 | Customer Discovery | Prove you deeply understand who you're building for and why. | Customer Behavior, Problem Context, Insights, Alternatives |
6 | Technical Pitch | Defend your architecture and technical decisions to a technical investor. | Architecture, Feasibility, Scalability, Technical Moat |
7 | Demo Day | Simulate a high-stakes Demo Day presentation under the spotlight. | Storytelling, Market Opportunity, Traction, Stage Presence |
8 | Customer Pitch | Pitch directly to a potential customer and win their buy-in. | Problem Relevance, Value Prop, Workflow Fit, Cost Justification |
9 | Co-Founder Pitch | Convince a potential co-founder to join your vision. | Vision Clarity, Role Alignment, Commitment, Persuasion |
10 | Investor Skeptic Mode | Face an aggressive skeptic who challenges every assumption. | Pressure Handling, Competition Defense, Composure, Realism |
A founder pitching the same idea across three scenarios gets three meaningfully different conversations and three meaningfully different scorecards. That's where the value compounds — the analytics dashboard can show that someone is consistently strong on Vision Clarity but weak on Pressure Handling, which is a coachable insight you'd never get from a generic "pitch to an investor" practice tool.
The Architecture
Pitch Gym is a three-tier serverless app — React 19 on the client, Supabase Edge Functions (Deno) as the trusted API layer, and a managed Postgres database locked down with Row-Level Security.

That client-never-touches-third-parties rule is intentional. The user's browser holds an anonymous Supabase token and a Vapi public key — nothing else. Every sensitive operation (credit deduction, payment verification, assistant configuration) happens inside an edge function with a JWT-verified user identity.
Five edge functions carry the weight:
Function | Job |
|---|---|
smooth-task | Authenticate user, check credits, configure the right Vapi assistant, enforce duration caps |
end-of-call-webhook | Process Vapi end-of-call payload, save recording, deduct fractional credits |
create-order | Create a Razorpay order with server-side price-to-credit mapping |
verify-payment | HMAC-SHA256 signature verification plus idempotent credit addition |
razorpay-webhook | Backup verification path for payment.captured events |
The whole back end is around 800 lines of TypeScript. No servers to provision, no Docker images, no autoscaling rules.
The data layer behind those functions is six tables, all locked down with RLS so users can only ever see their own rows:

The Hard Parts
1. LLM Output Repair
Vapi assistants ship with structured output schemas, but LLMs lie about JSON. Smart quotes slip in. Trailing commas. The occasional unescaped backslash in a "strengths" string. About 18% of the time, JSON.parse() on the raw model output would crash the result page.
The fix is a multi-stage repair pipeline in pitchParser.js:
function parseStructuredOutput(value) {
if (!value) return null;
if (typeof value === "object") return value;
try {
return JSON.parse(value);
} catch {
// Repair common LLM-isms before retrying
let repaired = value
.replace(/[“”]/g, '"') // smart quotes
.replace(/,\s*}/g, "}") // trailing object commas
.replace(/,\s*]/g, "]"); // trailing array commas
try {
return JSON.parse(repaired);
} catch {
return null; // graceful degradation
}
}
}After the repair pass, structured output success rate climbed from 82% to 96%. The remaining 4% render a "No assessment available" panel instead of crashing the UI — which is the right failure mode for a feature like this.
2. Server-Side Duration Enforcement
The credit system is time-based: 1 credit = 1 minute, stored as numeric(10,2) in Postgres so we can deduct 4.37 minutes exactly. But if duration enforcement lives in the React app, anyone with DevTools can extend their call forever.
The solution exploits a feature Vapi already provides: maxDurationSeconds. When the client kicks off a call, the smooth-task edge function calculates the maximum allowed duration as min(remaining_credits, pitch_type_cap), converts to seconds, and passes it to Vapi. The Vapi platform then enforces the limit on its own infrastructure. Even a tampered client gets hung up on at exactly the right moment.
function getMaxDuration(assistantType: number, remainingMinutes: number) {
const caps = { 2: 3, 7: 10 }; // Elevator: 3min, Demo Day: 10min
const typeCap = caps[assistantType] ?? 20;
const effective = Math.min(remainingMinutes, typeCap);
return Math.max(Math.round(effective * 60), 0);
}3. Idempotent Payments
Razorpay sends payment.captured webhooks. Sometimes twice. Sometimes after the client has already called verify-payment. Without idempotency, a single purchase could credit a user two or three times.
The payments table tracks credits_added as a boolean per payment row. The process_payment Postgres function checks that flag before incrementing credits and sets it atomically inside the same transaction. Two webhook deliveries hitting the same payment race for the row lock; only one wins; the other no-ops.

The Scoring System
Every session ends with a multi-metric assessment. The Idea Validation assistant scores Problem Clarity, Customer Definition, Evidence, and Communication. The Technical Pitch assistant scores Architecture, Feasibility, Scalability, and Technical Moat. Different rubrics, same shape:
{
"vc_pitch_assessment": {
"business_model": { "score": 4, "analysis": "..." },
"traction": { "score": 3, "analysis": "..." },
"go_to_market": { "score": 4, "analysis": "..." },
"pressure_handling": { "score": 3, "analysis": "..." }
},
"overall_rating": "Good",
"strengths": [{ "point": "..." }, { "point": "..." }],
"weaknesses": [{ "point": "..." }],
"improvement_suggestions": ["...", "..."]
}
Across 50 test sessions spanning all 10 pitch types, scores averaged 3.1/5.0 with reasonable difficulty separation — Investor Skeptic Mode produced lower averages (2.5) while Co-Founder Pitch came in higher (3.6). The same pitch delivered three times yielded a score variance of ±0.4 with identical qualitative strengths surfaced each time. Good enough to trust as directional feedback, not so deterministic that it ignores nuance.
Tracking Progress Over Time
A single session is interesting; ten sessions tell a story. The analytics dashboard plots score trends per idea, breaks down skill proficiency on a radar chart, and groups sessions by pitch type so founders can see which scenarios they've been avoiding.

The bigger unlock is session context carry-forward. When you start a new session under the same idea, the AI investor can pull a 1,500-character summary of your last session's strengths and weaknesses directly into its system prompt. Alex remembers that you fumbled the market sizing question last time and asks a sharper follow-up this time. Practice with continuity, not just repetition.
What I'd Do Differently
The 4% of unrepairable LLM outputs still bug me. Two paths forward: ask Vapi for retry-on-malformed-JSON at the platform level, or switch to function-calling-style structured output where the schema is enforced before generation completes. The audio-only modality also means we're missing non-verbal signal — eye contact, gestures, slide transitions — which is a real chunk of what investors are reading during a live pitch. Video is the obvious next step, but it's also a much bigger lift.
For now, Pitch Gym does one thing well: it gives founders an always-available, structured, voice-first place to get better at the conversation that matters most.
Project Context
Pitch Gym was developed as my B.Tech Final Year Project at Vellore Institute of Technology (VIT) in 2026. The project was awarded an S grade — the highest possible grade in VIT's grading system — and the accompanying thesis covers the literature review, system design (use case diagrams, ERDs, DFDs, sequence diagrams, state machines), implementation, and testing results across 50 sessions and 52 unit tests.
Stack Summary
Frontend: React 19, Vite 7, Tailwind v4, shadcn/ui, Recharts, react-router-dom v7
Voice: Vapi Web SDK over WebRTC
Backend: Supabase (Postgres, Auth, Edge Functions on Deno, Storage, RLS)
Payments: Razorpay with HMAC-SHA256 verification
Testing: Vitest (52 unit tests, 100% passing on parsing/analytics/context modules)




Comments