How to build a language model from the phonetic transcription?

I constructed a language model for the language tamil using data from wikipedia dumps ,using the tool CMUCLMTK.Now , how do I generate the phoenetic transcription and replace them in the model.The wiki article (http://cmusphinx.sourceforge.net/wiki/phonemerecognition) says to replace the transcription instead of words .What am I supposed to do now?

Upvotes: 0

Answers (1)

Abhishek Dandona

Reputation: 21

You can write a python script to replace a character with its phoneme. There are around 44 phonemes in english, you can simply create a dictionary that maps a character to its phoneme. And to convert your transcription to phonemes, just break down each word into characters and replace with its phoneme by matching character from your dictionary. You can make this more interesting by using term frequency or tf-idf

Upvotes: 1

How to build a language model from the phonetic transcription?

Answers (1)

Related Questions