Google speech to text not recognizing lot of audio

Question

I made a script to use the Speech to text API. It works fine with one audio (it is a m4a converted to wav), but it fail a lot (miss most of the text) with another similar audio (same origin, m4a converted to wav). Both audios sound similar (at least to my ear), but the results are pretty different. I had set both metadata and config, I don't know what else I can try to improve the results.

Relevant parameters:

metadata = {
        "original_media_type": enums.RecognitionMetadata.OriginalMediaType.AUDIO,
        "original_mime_type": 'audio/m4a',
}

sample_rate_hertz = 44100
encoding = enums.RecognitionConfig.AudioEncoding.LINEAR16

config = {
      "metadata": metadata,
      "sample_rate_hertz": sample_rate_hertz,
      "audio_channel_count": 2,
      "language_code": language_code,
       "encoding": encoding}

Since one of the files is parsed with acceptable results, I can conclude that my code is OK, that is why I am thinking in changing a parameter to fix the other audios.

Sorry I can't share the original audios.

Google speech to text not recognizing lot of audio

Answers (1)

Related Questions