Reputation: 11
Hi I'm using pyttsx3 in python which uses the Microsoft SDK SAPI5.1 text to speech synthesizer to generate audio from text. Problem that I'm facing is that the the speed of the speech it generate is not stable and it varies depending on the length of the text , the length of the words etc... It means that the same words would be pronounce faster or slower depending on the text they're in. These is setback for me because I need the timestamp for each word for the program that I'm creating to work properly , so far I try different formulas none of them are accurate.
Anyone has an idea how to solve this ? (ps I don't want to use speech analysis to solve this because of reliability issues)
Upvotes: 0
Views: 309
Reputation: 13932
You need to set up event handlers to get a notification on each word. Apparently pyttsx uses the connect
API to set up events:
engine.connect('started-utterance', onStart)
engine.connect('started-word', onWord)
engine.connect('finished-utterance', onEnd)
The onWord
signature has the duration (I believe).
Upvotes: 1