I designed and deployed a state-of-the-art AI chat and voice receptionist for FamilyHealth Clinic that delivers low-latency, human-like conversations while handling appointment scheduling, FAQs, and operational workflows in real time.
I designed and deployed a state-of-the-art AI chat and voice receptionist for FamilyHealth Clinic that delivers low-latency, human-like conversations while handling appointment scheduling, FAQs, and operational workflows in real time.
Dental and medical clinics face constant front-desk overload. Staff must manage high call volumes, complex appointment scheduling, and repetitive FAQs while providing a high level of patient care.
Generic chatbots and traditional IVR systems fail because they:
The challenge: build a voice-first AI receptionist that feels natural, responds instantly, remembers context, and books real appointments — without introducing operational risk.
FamilyHealth Voice AI is a real-time conversational receptionist powered by a directed orchestration graph and low-latency LLM inference.
Unlike static voice bots, this system performs structured reasoning across multiple services while maintaining conversational fluidity under one second response times. The system combines precise intent classification with RAG-based knowledge retrieval and live calendar integration.
The system utilizes a DAG-based (Directed Acyclic Graph) architecture to ensure deterministic routing and predictable state transitions:
This design ensures deterministic routing, parallel task execution, and predictable state transitions.
Chosen for graph-based state control rather than linear chains. This prevents memory resets and allows branching logic for booking, FAQs, and cancellations without conversational drift.
Groq’s LPU stack enables sub-second responses, critical for voice systems where silence degrades user trust. Intent classification and reasoning are separated across models to balance speed and depth.
Used for real-time retrieval of clinic procedures, pricing, and operational FAQs. I chose a local BGE-Small embedding model to ensure data privacy and zero-latency embedding API calls. Parallel async pre-fetching reduces perceived latency by 30–40%.
Direct API integration enables live availability checks and conflict prevention, transforming the assistant from an informational bot into a functional operational system.
Voice systems cannot tolerate silence. To ensure continuity, I implemented:
The system is designed for high-stakes interactions across two distinct layers, ensuring consistent intelligence regardless of the medium.
The backend is a Custom LLM Provider for Vapi, implementing the protocol via SSE. It maps call IDs to LangGraph threads to maintain persistent state and conversational memory throughout the call.
The companion web interface built in Typescript provides real-time streaming with status indicators for RAG and Calendar lookups. Each session maintains thread persistence, allowing users to switch between voice and text without losing context.
This project demonstrates applied AI engineering beyond experimentation — with real deployment constraints in mind.
▶ Try the system live on Hugging Face
An interactive demo running in a production-style environment.
Voice interfaces demand more than accurate responses — they require timing, reliability, and operational integration. This system shows how Voice AI can move beyond scripted interactions to become a functional part of real-world service infrastructure.
I specialize in designing and deploying production-grade AI agents that solve real operational challenges. Let's discuss how we can automate your high-stakes workflows.
Contact Me