Recognizing live speech with Sphinx4 java api

Question

I am trying to run the tutorial program for live speech recognition using Sphinx4. This is the main class:

public class LiveRecognition {

    public static void main(String[] args) throws Exception {
        Configuration configuration = new Configuration();
        configuration.setAcousticModelPath("resource:/edu/cmu/sphinx/models/en-us/en-us");
        configuration.setDictionaryPath("resource:/edu/cmu/sphinx/models/en-us/cmudict-en-us.dict");
        configuration.setLanguageModelPath("resource:/edu/cmu/sphinx/models/en-us/en-us.lm.bin");
        configuration.setUseGrammar(false);

        LiveSpeechRecognizer recognizer = new LiveSpeechRecognizer(configuration);

        recognizer.startRecognition(true);

        SpeechResult result;
        while ((result = recognizer.getResult()) != null) {
            for(WordResult word : result.getWords()) {
                System.out.println(word);
            }
        }
        recognizer.stopRecognition();
    }
}

So far I am using dictionary and acoustic models provided by Sphinx. When I run the program, it keeps producing random text almost as if it is talking with itself and whatever I am speaking through microphone, it doesn't even get close. For example output is like this:

....
{between, 1.000, [2700:3610]}
23:21:37.391 INFO speedTracker            This  Time Audio: 0.83s  Proc: 3.82s  Speed: 4.60 X real time
23:21:37.391 INFO speedTracker            Total Time Audio: 1.58s  Proc: 7.66s 4.85 X real time
23:21:37.391 INFO memoryTracker           Mem  Total: 1173.00 Mb  Free: 410.17 Mb
23:21:37.393 INFO memoryTracker           Used: This: 762.83 Mb  Avg: 507.82 Mb  Max: 762.83 Mb
23:21:37.393 INFO trieNgramModel       LM Cache Size: 4183 Hits: 990660 Misses: 4183
{, 1.000, [3610:5810]}
{what, 1.000, [5820:6380]}
23:21:41.615 INFO speedTracker            This  Time Audio: 0.55s  Proc: 2.21s  Speed: 4.01 X real time
23:21:41.615 INFO speedTracker            Total Time Audio: 2.13s  Proc: 9.87s 4.63 X real time
23:21:41.615 INFO memoryTracker           Mem  Total: 1316.50 Mb  Free: 540.36 Mb
23:21:41.615 INFO memoryTracker           Used: This: 776.14 Mb  Avg: 597.26 Mb  Max: 776.14 Mb
23:21:41.615 INFO trieNgramModel       LM Cache Size: 5332 Hits: 1263784 Misses: 5332
{, 1.000, [6380:9060]}
{ooh, 1.000, [9070:9280]}
....

What am I doing wrong? I want to see "hello world" when I say "hello world". Both words are present in the dictionary.

[UPDATE] I made a small language model file and corresponding dictionary using this online service from a small corpus file as described here. This time it worked with better accuracy using the default acoustic model provided with sphinx-data library. I don't need to train the acoustic model since I will be dealing mostly with English(US) language. But I want a good language model and dictionary for general purpose short sentences. Language model that comes with sphinx is not going well for me.

[UPDATE] Since Nikolay Shmyrev mentioned below it could be due to poor computing performance, this is what I use:

Intel® Core™ i7-4790 CPU @ 3.60GHz
16 GB DDR3 RAM
Windows 10 and Ubuntu 14.04

Processing power can be increased if needed.

Recognizing live speech with Sphinx4 java api

Answers (1)

Related Questions