Reputation: 5005
I successfully obtained the transcript and alternatives for a 5 minute long audio using Google Cloud Speech API (longrunningrecognize), but I'm not getting the full text of these 5 minutes, just a small transcript, as seen below:
{
"name": "2340863807845687922",
"metadata": {
"@type": "type.googleapis.com/google.cloud.speech.v1.LongRunningRecognizeMetadata",
"progressPercent": 100,
"startTime": "2018-09-20T13:25:57.948053Z",
"lastUpdateTime": "2018-09-20T13:28:18.406147Z"
},
"done": true,
"response": {
"@type": "type.googleapis.com/google.cloud.speech.v1.LongRunningRecognizeResponse",
"results": [
{
"alternatives": [
{
"transcript": "I am recording it. I think",
"confidence": 0.9223639
}
]
},
{
"alternatives": [
{
"transcript": "these techniques properly stated",
"confidence": 0.9190353
}
]
}
]
}
}
How do I get the full text generated by the transcription ?
Upvotes: 3
Views: 1845
Reputation: 164
Google Cloud Speech-to-Text provides very accurate results. For some long audios it provides the transcript broken into chunks as an array of alternatives as you observed. What I did was setting MaxAlternatives = 1 in my recognition config and then concatenating the alternatives array to get the full transcript. My recognition config in c# using Google.Cloud.Speech.V1 is given below
var config = new RecognitionConfig()
{
Encoding = RecognitionConfig.Types.AudioEncoding.Linear16,
//SampleRateHertz = 16000,
LanguageCode = "en",
EnableWordTimeOffsets = true,
MaxAlternatives = 1
};
Upvotes: 1
Reputation: 5005
I successfully solved this issue. I had to properly convert the file with ffmpeg:
$ ffmpeg -i /home/user/audio_test.wav -ac 1 -ab 8k audio_test2.wav
*** Remove silence:
sox audio_test2.wav audio_no_silence4.wav silence -l 1 0.1 1% -1 2.0 1%
And fix my sync-request.json:
{"config": {
"encoding":"MULAW",
"sampleRateHertz": 8000,
"languageCode": "pt-BR",
"enableWordTimeOffsets": false,
"enableAutomaticPunctuation": false,
"enableSpeakerDiarization": true,
"useEnhanced": true,
`enter code here`"diarizationSpeakerCount":2,
"audioChannelCount": 1},
"audio": {
"uri":"gs://storage/audio_no_silence4.wav"
}
}
And run curl
after that. It is working perfectly now.
Upvotes: 1
Reputation: 25210
Google Speech API is very painful thing to work with. Beside not being able to translate long files they randomly skip large chunks of audio from the transcription. Possible solutions are:
Upvotes: 1