coding4fun
coding4fun

Reputation: 3507

Sphinx4 - IllegalArgumentException

I have created all the files that are required to run Sphinx4(Language Model, Dictionary and Acoustic Model). But when I run it in Eclipse, the following exception is thrown:

00:16:12.707 INFO unitManager          CI Unit: AE
00:16:12.713 INFO unitManager          CI Unit: AH
00:16:12.714 INFO unitManager          CI Unit: B
00:16:12.714 INFO unitManager          CI Unit: EY
00:16:12.715 INFO unitManager          CI Unit: F
00:16:12.715 INFO unitManager          CI Unit: IY
00:16:12.716 INFO unitManager          CI Unit: JH
00:16:12.716 INFO unitManager          CI Unit: L
00:16:12.717 INFO unitManager          CI Unit: M
00:16:12.722 INFO autoCepstrum         Cepstrum component auto-configured as follows: autoCepstrum {MelFrequencyFilterBank, DiscreteCosineTransform}
00:16:12.853 INFO dictionary           Loading dictionary from: file:Alphabets/tutorial/alphabets/etc/alphabets.dic
00:16:12.853 INFO dictionary           Loading filler dictionary from: file:Alphabets/tutorial/alphabets/model_parameters/alphabets.ci_cont/noisedict
00:16:12.854 INFO acousticModelLoader  Loading tied-state acoustic model from: file:Alphabets/tutorial/alphabets/model_parameters/alphabets.ci_cont
00:16:12.854 INFO acousticModelLoader  Pool means Entries: 30
00:16:12.855 INFO acousticModelLoader  Pool variances Entries: 30
00:16:12.855 INFO acousticModelLoader  Pool transition_matrices Entries: 10
00:16:12.855 INFO acousticModelLoader  Pool senones Entries: 30
00:16:12.855 INFO acousticModelLoader  Pool mixture_weights Entries: 30
00:16:12.856 INFO acousticModelLoader  Pool senones Entries: 30
00:16:12.856 INFO acousticModelLoader  Context Independent Unit Entries: 10
00:16:12.856 INFO acousticModelLoader  HMM Manager: 10 hmms
00:16:12.860 INFO acousticModel        CompositeSenoneSequences: 0
00:16:12.861 INFO largeTrigramModel    Loading n-gram language model from: file:Alphabets/tutorial/alphabets/etc/alphabets.lm.dmp
00:16:12.867 INFO largeTrigramModel    1-grams: 3
00:16:12.867 INFO largeTrigramModel    2-grams: 1
00:16:12.867 INFO largeTrigramModel    3-grams: 1
00:16:13.094 INFO lexTreeLinguist      Max CI Units 11
00:16:13.095 INFO lexTreeLinguist      Unit table size 1331
Exception in thread "main" java.lang.IllegalArgumentException
    at com.google.common.base.Preconditions.checkArgument(Preconditions.java:111)
    at edu.cmu.sphinx.linguist.WordSequence.getWord(WordSequence.java:179)
    at edu.cmu.sphinx.linguist.language.ngram.large.LargeNGramModel.getNGramProbDepth(LargeNGramModel.java:409)
    at edu.cmu.sphinx.linguist.language.ngram.large.LargeNGramModel.getNGramProbDepth(LargeNGramModel.java:412)
    at edu.cmu.sphinx.linguist.language.ngram.large.LargeNGramModel.getNGramProbDepth(LargeNGramModel.java:412)
    at edu.cmu.sphinx.linguist.language.ngram.large.LargeNGramModel.getProbDepth(LargeNGramModel.java:393)
    at edu.cmu.sphinx.linguist.lextree.LexTreeLinguist$LexTreeState.createWordStateArc(LexTreeLinguist.java:720)
    at edu.cmu.sphinx.linguist.lextree.LexTreeLinguist$LexTreeWordState.getSuccessors(LexTreeLinguist.java:1491)
    at edu.cmu.sphinx.decoder.search.WordPruningBreadthFirstSearchManager.collectSuccessorTokens(WordPruningBreadthFirstSearchManager.java:635)
    at edu.cmu.sphinx.decoder.search.WordPruningBreadthFirstSearchManager.growBranches(WordPruningBreadthFirstSearchManager.java:387)
    at edu.cmu.sphinx.decoder.search.WordPruningBreadthFirstSearchManager.localStart(WordPruningBreadthFirstSearchManager.java:359)
    at edu.cmu.sphinx.decoder.search.WordPruningBreadthFirstSearchManager.startRecognition(WordPruningBreadthFirstSearchManager.java:262)
    at edu.cmu.sphinx.decoder.Decoder.decode(Decoder.java:62)
    at edu.cmu.sphinx.recognizer.Recognizer.recognize(Recognizer.java:109)
    at edu.cmu.sphinx.recognizer.Recognizer.recognize(Recognizer.java:125)
    at edu.cmu.sphinx.api.AbstractSpeechRecognizer.getResult(AbstractSpeechRecognizer.java:50)
    at Main.main(Main.java:30)

And this is the program I am running as stated on the official website:

import java.io.IOException;
import java.util.Scanner;

import edu.cmu.sphinx.api.Configuration;
import edu.cmu.sphinx.api.LiveSpeechRecognizer;
import edu.cmu.sphinx.api.SpeechResult;

public class Main {

    public static void main(String[] args) {

        Configuration configuration = new Configuration();

        configuration
                .setAcousticModelPath("Alphabets/tutorial/alphabets/model_parameters/alphabets.ci_cont");

        configuration.setDictionaryPath("Alphabets/tutorial/alphabets/etc/alphabets.dic");

        configuration
                .setLanguageModelPath("Alphabets/tutorial/alphabets/etc/alphabets.lm.dmp");

        LiveSpeechRecognizer recognizer = null;
        try {
            recognizer = new LiveSpeechRecognizer(configuration);
        } catch (IOException e) {
            e.printStackTrace();
        }
        recognizer.startRecognition(true);

        SpeechResult result = recognizer.getResult();

        recognizer.stopRecognition();

        System.out.println(result.getHypothesis());
        result.getLattice().dumpDot("lattice.dot", "lattice");

    }
}

The help is highly appreciated!!

Upvotes: 1

Views: 224

Answers (1)

Nikolay Shmyrev
Nikolay Shmyrev

Reputation: 25220

You language model /Alphabets/tutorial/alphabets/etc/alphabets.lm.dmp is in text arpa format but you added a dmp extension to it. This manual edit confuses the recognizer. To fix the issue rename alphabets.lm.dmp to alphabets.lm without dmp extension and edit the name in the code. Just use

configuration.setLanguageModelPath("Alphabets/tutorial/alphabets/etc/alphabets.lm");

You also do not have enough data to train the model, you model is not going to work. It's mandatory to have significant amount of data for training. You can find details in acoustic model training tutorial

http://cmusphinx.sourceforge.net/wiki/tutorialam

Upvotes: 1

Related Questions