GavinBrelstaff
GavinBrelstaff

Reputation: 3069

Generate timed-text synchronised with Text-to-Speech word-by-word?

How can I generate timed-text (e.g. for subtitles) synchronised with Text-to-Speech (TTS) word-by-word?

I'd like to do this using the high quality SAPI5 voices (e.g. those available from IVONA here) and that I have used on Windows 10.

On Windows we already have some good free TTS programs:

  1. Read4Me - open source
  2. Balabolka - closed source
  3. TTSApp Microsoft's own very basic GUI - currently available here - it seems to date from 2001.

TTSApp can produce audio files in WAV. Balabolka creates MP3 files along with synchronised timed-text as LRC files used in Karaoke - BUT only on line-by-line basis NOT word-by-word.
However, both show word-by-word highlighting while they speak aloud on screen - in real time.

If I had some TTS/SAPI5 source code I could simply check the clock every time a new word starts to be generated and write the time and that word to a file. Does anyone know of any project that exposes that level of programming - so I might start from there?

UPDATE SEPT 2016

I've since discovered the TTSApp was reimplemented using AutoHotKey by a certain jballi in 2012.

I've adapted that code to append to a text file the time in ms every time the onWord event handler fires. Still I need to make two passes:

  1. a rapid automated pass to save the WAV file and
  2. a slow (realtime) pass that creates the timing file.

I am still hoping to find a way to accelerate step 2.

BTW The VisualBasic source appears to be archived here.

Upvotes: 4

Views: 1570

Answers (1)

GavinBrelstaff
GavinBrelstaff

Reputation: 3069

It is possible to do all of this offline!

You generate a WAV file using SAPI while specifying DoEvents - documented here.

A binary representation of each event (e.g. phoneme/word/sentence) gets appended to the end of the WAV file. A certain Hans documented the WAV/SAPI format in 2009 here.

This can all be done by a simple modification of jballi's 2012 AutoHotkey version of TTSApp

Basically you replace these lines of code in Example1GUI.ahk

SpFileStream.Open(SaveToFileName,SSFMCreateForWrite,False)

;-- Set the output stream to the file stream
SpVoice.AllowAudioOutputFormatChangesOnNextSet:=False
SpVoice.AudioOutputStream:=SpFileStream

;-- Speak using the given flags
SpVoice.Speak(Text,SpeakFlags)

with the following:

SpFileStream.Open(SaveToFileName,SSFMCreateForWrite,True) ;-- DoEvents 

;-- Set the output stream to the file stream
SpVoice.AllowAudioOutputFormatChangesOnNextSet:=False
SpVoice.AudioOutputStream:=SpFileStream

if not Sink ;-- DoEvents label
  {
    ComObjConnect(SpVoice, "On")
    Sink:=True
  }

;-- Speak using the given flags
SpVoice.Speak(Text,SpeakFlags|SVSFlagsAsync|SVSFPurgeBeforeSpeak)

Upvotes: 0

Related Questions