Patrick
Patrick

Reputation: 251

Speech recognition with precise timestamp?

Hy Community,

i´ve worked with Google´s txt to speech API.

When i would like to encode an wav audio file (extracted from an video), the timestamps for some words are not very precise. (the resolution according google is 0,1sec - but in my case, sometimes its more weak/delay).

I thought i could try a workaround by decrease the speed of audio file, but it´s more or less the same result.

Somebody know some precise API´s for speech recognition, or have some hints for better preparing the audio files?

i would like to determine one by one word including theire exact timestamps.

Thanks a lot!

Upvotes: 5

Views: 4735

Answers (1)

Nikolay Shmyrev
Nikolay Shmyrev

Reputation: 25220

Modern speech recognition algorithms trade alignment accuracy for speed of decoding, so it might be the case that Google's recognizer doesn't assign very accurate timestamps.

More accurate alignment is possible with open source recognizer like Kaldi, see https://github.com/lowerquality/gentle or something similar. You will have to realign Google results to get proper timestamps though.

Upvotes: 3

Related Questions