Reputation: 251
Hy Community,
i´ve worked with Google´s txt to speech API.
When i would like to encode an wav audio file (extracted from an video), the timestamps for some words are not very precise. (the resolution according google is 0,1sec - but in my case, sometimes its more weak/delay).
I thought i could try a workaround by decrease the speed of audio file, but it´s more or less the same result.
Somebody know some precise API´s for speech recognition, or have some hints for better preparing the audio files?
i would like to determine one by one word including theire exact timestamps.
Thanks a lot!
Upvotes: 5
Views: 4735
Reputation: 25220
Modern speech recognition algorithms trade alignment accuracy for speed of decoding, so it might be the case that Google's recognizer doesn't assign very accurate timestamps.
More accurate alignment is possible with open source recognizer like Kaldi, see https://github.com/lowerquality/gentle or something similar. You will have to realign Google results to get proper timestamps though.
Upvotes: 3