DAKSH supports real-time voice input and output through an integrated voice layer. Built on Whisper-like custom speech-to-text models and fast, expressive text-to-speech synthesis engines, this layer allows:
-
Automatic voice-to-query transformation
-
Confidence-based filtering and re-prompting
-
Query context retention across spoken interactions
-
Dynamic response vocalization
When deployed in public-facing scenarios (e.g., kiosks or mobile apps), the voice interface becomes critical to accessibility, particularly for users who prefer native-language conversation or are not tech-savvy. DAKSH is currently capable of supporting over 15 Indian languages and dialects with fluency and context retention.
The system automatically detects language from the voice input, maps it to its normalized text equivalent, and processes it through the same pipeline as text queries. The response is then synthesized back into speech, ensuring full-loop interaction without screen dependency.