What is the next procedure after creating a CMUSphinx language model with my own dictionary?

Question

I have created my own CMUSphinx language model for Arabic language for a software that will be listening to a user and apply commands with my own dictionary that I've done it manually by hand, converted "arpa" language model type to "dmp" language model using the command sphinx_lm_convert -i ar.lm -o ar.lm.dmp, so here is the files that i have so far:

.txt (the commands text file)
.wfreq (freq of words file)
.idngram (ngram file)
.dic (dictionary file)
.phone (phonemes file)
.lm (arpa language model file)
.lm.dmp (Darpa Trigram dump language model file)

I then recorded my self of saying each word, each word has a its own .wav file and they are all in one folder that is separate from the folder where .dic, .txt, .lm exists.

My question is what is the next step as i was reading here http://cmusphinx.sourceforge.net/wiki/tutorial?

It says that Adapting existing acoustic model is the next step after building the language model, isn't it training the language model?

And if it is training, i have all the files required except the:

.transcription
.fileids

what should be inside these two files?

Thank

Nikolay Shmyrev · Accepted Answer

Procedure for training acoustic model is described in tutorial for Acoustic Model Training.

You need to create fileids and transcription files manually in a text editor or with a script if you want to convert existing transcription in any custom form to required format.

Fileids must list the file names, transcription file must list transcription for each of the files in a special format.

For example of acoustic model training database you can check inside an4 database.

What is the next procedure after creating a CMUSphinx language model with my own dictionary?

Answers (1)

Related Questions