Eshaan Ravish
Eshaan Ravish

Reputation: 11

Azure Speech-to-Text: Poor Transcription Accuracy with Client-Side Audio Buffer and Server-Side SDK Approach

I am implementing Azure Speech-to-Text in my application using the following approach:

Client Side: Audio is recorded in small buffers and sent to the server every few seconds via WebSocket. Server Side: The buffers are processed using the Azure Speech SDK to convert speech to text. Issue: The transcription accuracy is poor compared to the client-side SDK implementation. The last few words in each buffer are often missed or incorrectly transcribed. I have also noticed audio data leakage, which seems to be affecting the transcription quality.

Experimented with different buffer sizes (e.g., 1s, 3s, 5s) to find the optimal chunk duration. Adjusted the sample rate to match Azure's recommended configurations. Confirmed that audio buffers are being sent and received in the correct sequence without any overlap or gaps. Analyzed the raw audio data and verified that the recording quality is good.

Upvotes: 1

Views: 7

Answers (0)

Related Questions