VOICE_AI / 02

FamilyHealth AI Receptionist (Voice/Chat)

Role

AI Systems Engineer

Focus

Live Scheduling & RAG

Stack

FastAPI, LangGraph, Vapi

I designed and deployed a state-of-the-art AI chat and voice receptionist for FamilyHealth Clinic that delivers low-latency, human-like conversations while handling appointment scheduling, FAQs, and operational workflows in real time.

▶ Live Voice Deployment: Vapi Integration
🎙️ Tech Stack: Python · FastAPI · LangGraph · Pinecone · Groq · Google Calendar API · Twilio · SQLite

The Problem

Dental and medical clinics face constant front-desk overload. Staff must manage high call volumes, complex appointment scheduling, and repetitive FAQs while providing a high level of patient care.

Generic chatbots and traditional IVR systems fail because they:

Feel robotic, rigid, and frustrating to use
Lack real-time access to operational data like calendars
Struggle with multi-turn memory and complex intent
Miss critical calls outside of business hours

The challenge: build a voice-first AI receptionist that feels natural, responds instantly, remembers context, and books real appointments — without introducing operational risk.

The Solution

FamilyHealth Voice AI is a real-time conversational receptionist powered by a directed orchestration graph and low-latency LLM inference.

Unlike static voice bots, this system performs structured reasoning across multiple services while maintaining conversational fluidity under one second response times. The system combines precise intent classification with RAG-based knowledge retrieval and live calendar integration.

System Architecture

The system utilizes a DAG-based (Directed Acyclic Graph) architecture to ensure deterministic routing and predictable state transitions:

Entry: Voice calls (Vapi) or web chat messages enter a LangGraph DAG
Classification: Intent is analyzed using Groq (Llama-3.1-8B)
Parallel Processing: Pinecone RAG results are pre-fetched while routing occurs
Deterministic Routing: Flow is directed to specialized nodes (FAQ, Booking, etc.)
Logic Flow: Booking flow interacts with Google Calendar API for live availability
Reasoning: Primary conversational logic handled by Groq GPT-OSS-20B
Persistence: Conversation state is maintained via a SQLite checkpointer or Redis
Streaming: Responses are streamed via SSE for minimal perceived latency

This design ensures deterministic routing, parallel task execution, and predictable state transitions.

Key Engineering Decisions

Orchestration: LangChain LangGraph

Chosen for graph-based state control rather than linear chains. This prevents memory resets and allows branching logic for booking, FAQs, and cancellations without conversational drift.

Ultra-Low Latency Inference: Groq

Groq’s LPU stack enables sub-second responses, critical for voice systems where silence degrades user trust. Intent classification and reasoning are separated across models to balance speed and depth.

RAG Infrastructure: local BGE-Small & Pinecone

Used for real-time retrieval of clinic procedures, pricing, and operational FAQs. I chose a local BGE-Small embedding model to ensure data privacy and zero-latency embedding API calls. Parallel async pre-fetching reduces perceived latency by 30–40%.

Scheduling Integration: Google Calendar API

Direct API integration enables live availability checks and conflict prevention, transforming the assistant from an informational bot into a functional operational system.

Latency Masking Strategy

Voice systems cannot tolerate silence. To ensure continuity, I implemented:

Async RAG pre-fetch during initial intent classification
Streaming responses via Server-Sent Events (SSE)
Status signals and filler acknowledgments during heavy API call execution

Multi-Channel Delivery: Voice & Web

The system is designed for high-stakes interactions across two distinct layers, ensuring consistent intelligence regardless of the medium.

Native Voice (Vapi Integration)

The backend is a Custom LLM Provider for Vapi, implementing the protocol via SSE. It maps call IDs to LangGraph threads to maintain persistent state and conversational memory throughout the call.

Visual Web Chat

The companion web interface built in Typescript provides real-time streaming with status indicators for RAG and Calendar lookups. Each session maintains thread persistence, allowing users to switch between voice and text without losing context.

Core Capabilities

Natural conversational voice flow with barge-in support
Multi-turn memory with persistent conversational state
RAG-powered FAQ handling for procedures and clinic data
Real-time appointment booking and cancellation via Google Calendar
Multi-channel delivery (Synchronous Voice + Asynchronous Web Chat)
Automated SMS notifications and confirmations via Twilio

Outcome & Impact

Delivered a fully operational AI receptionist that operates autonomously via Voice and Web Chat 24/7
Significantly reduced front-desk workload by handling routine FAQs and scheduling
Maintained sub-second conversational responsiveness across diverse intents
Built a modular architecture adaptable for various service offices and practices

This project demonstrates applied AI engineering beyond experimentation — with real deployment constraints in mind.

Live Demo

▶ Try the system live on Hugging Face
An interactive demo running in a production-style environment.

Skills Demonstrated

Voice AI System Design & Deployment
Graph-Based Orchestration (LangGraph)
RAG for Real-Time Knowledge Recall
Low-Latency LLM Inference Optimization (Groq)
Parallel Async Processing (Pre-fetching)
API Integration (Google Calendar, Twilio)
Advanced State Management & Memory

Planned Improvements

Advanced analytics dashboard for call performance and booking success
Multi-location and role-based escalation paths
Structured evaluation benchmarks for latency and intent accuracy

Why This Project Matters

Voice interfaces demand more than accurate responses — they require timing, reliability, and operational integration. This system shows how Voice AI can move beyond scripted interactions to become a functional part of real-world service infrastructure.

Want such a system custom built for your business?

I specialize in designing and deploying production-grade AI agents that solve real operational challenges. Let's discuss how we can automate your high-stakes workflows.

Contact Me