Reputation: 107
How to create dictionary(.dict) file for our specific domain Language model. I'm using CMU tool kit to create ARPA format Language model, but in that there is no option to create .dict file. Thanks in advance.
Upvotes: 0
Views: 1003
Reputation: 69
There is a short tutorial page that explains several ways to generate the dictionary for Sphinx.
In general, for English there is an existing dictionary that covers quite many words. If it does not contain any of your specific domain words, the pronunciations should be generated by grapheme-to-phoneme (G2P) system listed in the first link. G2P learns from an existing dictionary and generates pronunciations for the new ones.
One thing to take into account is the acoustic model. If you use some of the already trained Sphinx models, you should make sure the pronunciations are generated with the same phoneme set as the training dictionary.
Upvotes: 1