dab1984
dab1984

Reputation: 47

issue on creating language model for sinhala usin SRILM

I'm trying to create a sinhala voice recognition system using pocketsphinx. I use SRILM tool to create language model. My source files to create the laguage model are Here . Im using cygwin on windows 8.1 to run SRILM 1.7.1. But once i run the command

ngram-count -vocab sinhalalexicon.txt -text sinhalacorpus.Train -order 3     -write sinhala.count -unk

I'm getting

iconv: Invalid or incomplete multibyte or wide character
iconv: Invalid or incomplete multibyte or wide character

What did I do wrong here? sinhalacorpus.Train file was created by manually using Notepad++

Upvotes: 1

Views: 109

Answers (1)

dab1984
dab1984

Reputation: 47

I found the solution to my issue. once I convert the corpus and lexicon files to Unix format and change the encoding to UTF-8 without BOM it worked. I used Notepad++ to do the changes.

Upvotes: 1

Related Questions