Reputation: 1497
I'm trying to adapt wsj model to undersrand only 4 words from me, I have created a bash file and Ive tried near 20 times, but when I run and say "stop", it fails up to 90%. here's my bash file, please let me know, am I doing anything wrong or do I need to train it much more, like 100 times?
#!/bin/bash
for i in {1..4}
do
fn=`printf arctic_%04d $i`;
read sent; echo $sent;
rec -r 16000 -e signed-integer -b 16 -c 1 $fn.wav 2>/dev/null;
done < arctic20.txt
sphinx_fe -argfile Model/feat.params \
-samprate 16000 -c arctic20.fileids -di . -do . \
-ei wav -eo mfc -mswav yes
bw/bw \
-hmmdir Model \
-moddeffn Model/mdef \
-ts2cbfn .cont. \
-feat 1s_c_d_dd \
-cmn current \
-agc none \
-dictfn arctic20.dic \
-ctlfn arctic20.fileids \
-lsnfn arctic20.transcription \
-accumdir .
cp -a Model/* Model.adapted
map_adapt/map_adapt \
-meanfn Model/means \
-varfn Model/variances \
-mixwfn Model/mixture_weights \
-tmatfn Model/transition_matrices \
-accumdir . \
-mapmeanfn Model.adapted/means \
-mapvarfn Model.adapted/variances \
-mapmixwfn Model.adapted/mixture_weights \
-maptmatfn Model.adapted/transition_matrices
cp -r Model.adapted/* ~/NetBeansProjects/sphinx4-1.0beta6/models/acoustic/wsj
cp -r Model.adapted/* Model
And I'm running it over and over again Than I Clean and Build project, and run helloworld demo, I modified .gram file there. btw transcription: < s> stop < /s> (arctic_0001) < s> left < /s> (arctic_0002) < s> right < /s> (arctic_0003) < s> go < /s> (arctic_0004) I added spaces so that here it doesnt read as code here dictionary and fileids are also OK
Thanks
P.S. thanks to dariusz, but it still doesnt work
Upvotes: 3
Views: 2071
Reputation: 105
You shouldn't use MAP at all. MAP adaptation requires much larger amount of adapt data, as it modifies all parameters in the model. You will have better chance with MLLR, which is available in sphinx and here's the tutorial: http://cmusphinx.sourceforge.net/wiki/tutorialadapt
Upvotes: 1
Reputation: 22281
It is very difficult to determine what goes in such a complex process.
What you should do is set up a repeatable test case and use it to verify your progress. It should contain at least 100 test sentences (words, in your case). It can be done with sphinx, see this link
Only after you have the test ready, proceed to make changes to the acoustic model or grammar. Compare each change you make with the original (unmodified model) accuracy. Then you will know which steps are good and which are bad.
Another things are the training data - I may be wrong, but I think that such short one-word audio files are not the best for adapting the model. I would suggest using longer files, even if it means repeating the same word several times. Just make sure you speak exactly the right amount and make clear spaces between words.
Upvotes: 1