Reputation: 1943
I made a script to use the Speech to text API. It works fine with one audio (it is a m4a converted to wav), but it fail a lot (miss most of the text) with another similar audio (same origin, m4a converted to wav). Both audios sound similar (at least to my ear), but the results are pretty different. I had set both metadata and config, I don't know what else I can try to improve the results.
Relevant parameters:
metadata = {
"original_media_type": enums.RecognitionMetadata.OriginalMediaType.AUDIO,
"original_mime_type": 'audio/m4a',
}
sample_rate_hertz = 44100
encoding = enums.RecognitionConfig.AudioEncoding.LINEAR16
config = {
"metadata": metadata,
"sample_rate_hertz": sample_rate_hertz,
"audio_channel_count": 2,
"language_code": language_code,
"encoding": encoding}
Since one of the files is parsed with acceptable results, I can conclude that my code is OK, that is why I am thinking in changing a parameter to fix the other audios.
Sorry I can't share the original audios.
Upvotes: 0
Views: 1640
Reputation: 835
You could review your audio input, consider that audio format is not equal to an audio encoding.
Based on that, I suggest verifying the encoding used or try with one difference. You can also check the Cloud Speech-to-Text best practices.
Also confirm the supported audio encodings, seems that Cloud Speech-to-Text supports WAV files with LINEAR16 or MULAW encoded audio.
Upvotes: 1