Christian
Christian

Reputation: 26427

Which data structure for linking text with audio in Java

I want to write a program in which plays an audio file that reads a text. I want to highlite the current syllable that the audiofile plays in green and the rest of the current word in red. What kind of datastructure should I use to store the audio file and the information that tells the program when to switch to the next word/syllable?

Upvotes: 2

Views: 436

Answers (4)

Marcus Downing
Marcus Downing

Reputation: 10141

This is a slightly left-field suggestion, but have you looked at Karaoke software? It may not be seen as "serious" enough, but it sounds very similar to what you're doing. For example, Aegisub is a subtitling program that lets you create subtitles in the SSA/ASS format. It has karaoke tools for hilighting the chosen word or part.

It's most commonly used for subtitling anime, but it also works for audio provided you have a suitable player. These are sadly quite rare on the Mac.

The format looks similar to the one proposed by Yuval A:

{\K132}Unmei {\K34}no {\K54}tobira
{\K60}{\K132}yukkuri {\K36}to {\K142}hirakareta

The lengths are durations rather than absolute offsets. This makes it easier to shift the start of the line without recalculating all the offsets. The double entry indicates a pause.

Is there a good reason this needs to be part of your Java program, or is an off the shelf solution possible?

Upvotes: 3

jim
jim

Reputation: 1552

To highlight part of word sounds like you're getting into phonetics which are sounds that make up words. It's going to be really difficult to turn a sound file into something that will "read" a text. Your best bet is to use the text itself to drive a phonetics based engine, like FreeTTS which is based off of the Java Speech API.

To do this you're going to have to take the text to be read, split it into each phonetic syllable and play it. so "syllable" is "syl" "la" "ble". Playing would be; highlight syl, say it and move to next one.

This is really "old-skool" its been done on the original Apple II the same way.

Upvotes: 1

Yuval Adam
Yuval Adam

Reputation: 165292

How about a simple data structure that describes what next batch of letters consists of the next syllable and the time stamp for switching to that syllable?

Just a quick example:

[0:00] This [0:02] is [0:05] an [0:07] ex- [0:08] am- [0:10] ple

Upvotes: 1

anjanb
anjanb

Reputation: 13867

you might want to get familiar with FreeTTS -- this open source tool : http://freetts.sourceforge.net/docs/index.php -

You might want to feed only a few words to the TTS engine at a given point of time -- highlight them and once those are SPOKEN out, de-highlight them and move to the next batch of words.

BR,
~A

Upvotes: 0

Related Questions