Reputation: 63
I have been using the Chromium Google Speech API and switched over to using the Google Cloud Speech API recently. Ever since the Google cloud speech API got announced, the performance seems to have degraded in terms of the accuracy of recognition. Also I see that there are more and more "empty results" coming back for audio streamed.
I stream audio simultaneously to multiple different services and Google Cloud Speech API is returning empty result while some of the other services are returning transcribed text. Makes me wonder if there is anything changed in the way the Chromium Speech API and the Google Cloud Speech API work?
I validated the audio for proper headers and validated that I am streaming audio to Google.
Is anyone experiencing that Google sometimes (more like majority of the time) returning empty result?
Upvotes: 6
Views: 3543
Reputation: 11
I also have same problem that Google Speech API returned empty result. I used FFmpgeg to convert my audio file to LINEAR16. For installation this tool I used Homebrew:
brew install ffmpeg
For converting my audio file to LINEAR16 I used this command:
ffmpeg -i input.flac -f s16le -acodec pcm_s16le output.raw
And after I loaded it to my Google stogage: https://console.cloud.google.com/storage/browser/
Here is my JSON file with config for making request:
{
'config': {
'encoding':'LINEAR16',
'sampleRate': 16000,
'languageCode': 'en-US'
},
'audio': {
'uri':'gs://your-bucket-name/output.raw'
}
}
For files more than 1 minute you need to use Asyncrecognize method:
curl -s -k -H "Content-Type: application/json" \
-H "Authorization: Bearer [YOUR-KEY]" \
https://speech.googleapis.com/v1beta1/speech:asyncrecognize \
-d @sync-request.json
it will return operation id. You can check if it's ready by get operation result:
curl -s -k -H "Content-Type: application/json" \
-H "Authorization: Bearer " [YOUR-KEY]\
https://speech.googleapis.com/v1beta1/operations/[OPERATION-ID]
Upvotes: 1
Reputation: 1108
I was also receiving empty responses but eventually got results by encoding with different settings.
sox async.wav -t raw --channels=1 --bits=16 --rate=16000 --encoding=signed-integer --endian=little async.raw
Upvotes: 1
Reputation: 315
This type of question is more appropriate for Public Issue Tracker as it would require further details in order to reproduce your exact errors. Make sure to fill in this form with the required information or at least with a minimal working example of your code clearly highlighting the problem. For an accurate reproduction, It would be important to provide the sample codes or commands that you executed and which returned the error alongside the configuration files and the URIs(or files) of the audio files you streamed and which returned empty results.
As a matter of fact, there exists known issues with the speech API that is currently in the Beta and so may prevent the transcription from working correctly. In the meantime, You may refer to the following documentation to determine if any of the best practices would apply to your case.
Upvotes: 3