How to use CMU Sphinx for forced alignment phoneme recognition?

I'm working on simple TTS-engine. It would be good to have a automatic diphone segmentation system which takes a recorder sound and phoneme subscript (for single utterance) and sets the phoneme boundaries in the sound. Is it possible to be done with CMU Sphinx? Which version of sphinx I should to use?

Upvotes: 2

Answers (1)

Nikolay Shmyrev

Reputation: 25220

You can train a speaker-dependent model specific to your speaker with Sphinxtrain. For more details on training see

http://cmusphinx.sourceforge.net/wiki/tutorialam

To segment the database you can use sphinx3_align binary like this:

  sphinx3_align \
    -hmm <model_dir> \
    -dict dictionary.dic \
    -ctl db.fileids \
    -cepdir <feats_folder> \
    -cepext .mfc \
    -insent db.transcription \
    -outsent db.out \
    -phlabdir phlabdir

The phone-level alignment will be created in a folder called phlabdir

Upvotes: 2

How to use CMU Sphinx for forced alignment phoneme recognition?

Answers (1)

Related Questions