Reputation: 2327
I am trying to run the tutorial program for live speech recognition using Sphinx4. This is the main class:
public class LiveRecognition {
public static void main(String[] args) throws Exception {
Configuration configuration = new Configuration();
configuration.setAcousticModelPath("resource:/edu/cmu/sphinx/models/en-us/en-us");
configuration.setDictionaryPath("resource:/edu/cmu/sphinx/models/en-us/cmudict-en-us.dict");
configuration.setLanguageModelPath("resource:/edu/cmu/sphinx/models/en-us/en-us.lm.bin");
configuration.setUseGrammar(false);
LiveSpeechRecognizer recognizer = new LiveSpeechRecognizer(configuration);
recognizer.startRecognition(true);
SpeechResult result;
while ((result = recognizer.getResult()) != null) {
for(WordResult word : result.getWords()) {
System.out.println(word);
}
}
recognizer.stopRecognition();
}
}
So far I am using dictionary and acoustic models provided by Sphinx. When I run the program, it keeps producing random text almost as if it is talking with itself and whatever I am speaking through microphone, it doesn't even get close. For example output is like this:
....
{between, 1.000, [2700:3610]}
23:21:37.391 INFO speedTracker This Time Audio: 0.83s Proc: 3.82s Speed: 4.60 X real time
23:21:37.391 INFO speedTracker Total Time Audio: 1.58s Proc: 7.66s 4.85 X real time
23:21:37.391 INFO memoryTracker Mem Total: 1173.00 Mb Free: 410.17 Mb
23:21:37.393 INFO memoryTracker Used: This: 762.83 Mb Avg: 507.82 Mb Max: 762.83 Mb
23:21:37.393 INFO trieNgramModel LM Cache Size: 4183 Hits: 990660 Misses: 4183
{<sil>, 1.000, [3610:5810]}
{what, 1.000, [5820:6380]}
23:21:41.615 INFO speedTracker This Time Audio: 0.55s Proc: 2.21s Speed: 4.01 X real time
23:21:41.615 INFO speedTracker Total Time Audio: 2.13s Proc: 9.87s 4.63 X real time
23:21:41.615 INFO memoryTracker Mem Total: 1316.50 Mb Free: 540.36 Mb
23:21:41.615 INFO memoryTracker Used: This: 776.14 Mb Avg: 597.26 Mb Max: 776.14 Mb
23:21:41.615 INFO trieNgramModel LM Cache Size: 5332 Hits: 1263784 Misses: 5332
{<sil>, 1.000, [6380:9060]}
{ooh, 1.000, [9070:9280]}
....
What am I doing wrong? I want to see "hello world" when I say "hello world". Both words are present in the dictionary.
[UPDATE] I made a small language model file and corresponding dictionary using this online service from a small corpus file as described here. This time it worked with better accuracy using the default acoustic model provided with sphinx-data library. I don't need to train the acoustic model since I will be dealing mostly with English(US) language. But I want a good language model and dictionary for general purpose short sentences. Language model that comes with sphinx is not going well for me.
[UPDATE] Since Nikolay Shmyrev mentioned below it could be due to poor computing performance, this is what I use:
Processing power can be increased if needed.
Upvotes: 1
Views: 4757
Reputation: 25220
Your computer is too slow, it can not process audio in realtime, thus inaccurate. For slow computers use pocketsphinx instead.
Pocketsphinx has Java/JNI API too, you can find example here, it should look like this:
Config c = Decoder.defaultConfig();
c.setString("-hmm", "../../model/en-us/en-us");
c.setString("-lm", "../../model/en-us/en-us.lm.bin");
c.setString("-dict", "../../model/en-us/cmudict-en-us.dict");
Decoder d = new Decoder(c);
FileInputStream ais = new FileInputStream(new File("../../test/data/goforward.raw"));
d.startUtt();
d.setRawdataSize(300000);
byte[] b = new byte[4096];
int nbytes;
while ((nbytes = ais.read(b)) >= 0) {
ByteBuffer bb = ByteBuffer.wrap(b, 0, nbytes);
bb.order(ByteOrder.LITTLE_ENDIAN);
short[] s = new short[nbytes/2];
bb.asShortBuffer().get(s);
d.processRaw(s, nbytes/2, false, false);
}
d.endUtt();
System.out.println(d.hyp().getHypstr());
short[] data = d.getRawdata();
System.out.println("Data size: " + data.length);
DataOutputStream dos = new DataOutputStream(new FileOutputStream(new File("/tmp/test.raw")));
for (int i = 0; i < data.length; i++) {
dos.writeShort(data[i]);
}
dos.close();
for (Segment seg : d.seg()) {
System.out.println(seg.getWord());
}
Upvotes: 0