Reputation: 463
In C#, I'm using Google's Speech Recognition (streaming/real-time, not via pre-recorded audio files). The streaming audio is coming from a Twilio phone call.
90% of the time, it's working flawlessly. At other times I've getting a JSON parsing error on the payload that Google is sending to me. I've added additional logging to store the received payload to see what Google's sending when the JSON parser fails, and it's this:
{"event":"media","sequenceNumber":"12","media":{"track":"inbound","chunk":"11","timestamp":"309",
"payload":"////////////////////////////////////////
////////////////////////////////////////////////////////////////
/////////////////////////////////////////////////////
That's literally the entire thing. The "payload" value is just a bunch of slashes and it doesn't properly terminate.
Additionally, there's another exception that occurs approximately 10secs later during the Google websocket event named "start" when it's listening for the speech to be streamed to it:
Grpc.Core.RpcException: Status(StatusCode="OutOfRange",
Detail="Audio Timeout Error: Long duration elapsed without
audio. Audio should be sent close to real time."
The audio timeout error is weird, because there doesn't seem to be any major gaps in the phone conversation. Each of the Google Speech Recognition websocket sessions only last a few seconds at most (a question is asked by Twilio, a websocket to Google is opened, Google listens for "YES" or "NO" speech from the user, Google transcribes it, and this is repeated x number of times).
Here's a snippet of the C#
private async Task ProcessAudioStream_English(AspNetWebSocketContext context)
{
string jsonData = ""; //NEW - for debugging/exception catching
try
{
WebSocket webSocket = context.WebSocket;
string streamSid = null;
var buffer = new byte[1024 * 4];
var receiveResult = await webSocket.ReceiveAsync(new ArraySegment<byte>(buffer), CancellationToken.None);
var _speechRecognitionService = new GoogleSpeechRecognition();
string lang = "en-US";
Debug.WriteLine($"Language: {lang}");
while (!receiveResult.CloseStatus.HasValue)
{
jsonData = Encoding.UTF8.GetString(buffer, 0, receiveResult.Count);
var jsonDocument = JsonDocument.Parse(jsonData); //<--sometime has invalid JSON
I initially thought that the buffer size (1024*4) might be too small, but unless my tired brain is getting me, the above JSON payload is only 265 in length.
Any idea why Google Speech Recognition (streaming) is sometimes sending me responses with a pile of ////////'s and the JSON isn't well formed?
Thanks!
Upvotes: 0
Views: 54