user3246661
user3246661

Reputation: 11

Sphinx4: How can improve the accuracy of recognizing wav file in dialog demo

I have edited the dialog code to make it work for my project.

  1. I have created a text file with some of the possible sentences to be used in my work. I added the link in the comment section.
  2. I have followed the steps on http://cmusphinx.sourceforge.net/wiki/tutoriallm to build my language model using web service.
  3. then, I edited the dialog code to be:

    package dialog;
    
    import edu.cmu.sphinx.api.Configuration;
    import edu.cmu.sphinx.api.SpeechResult;
    import edu.cmu.sphinx.api.StreamSpeechRecognizer;
    import java.io.File;
    import java.io.FileInputStream;
    import java.io.InputStream;
    
    public class EmployeeCode {
    
    private static final String ACOUSTIC_MODEL = "resource:/edu/cmu/sphinx/models/en-us/en-us";
    private static final String DICTIONARY_PATH = "models/language/TAR0779/0779.dic";
    private static final String LANGUAGE_MODEL = "models/language/TAR0779/0779.lm";
    
    
    public static void main(String[] args) throws Exception {
    
        System.out.println("Loading models...");
    
        Configuration configuration = new Configuration();
        configuration.setAcousticModelPath(ACOUSTIC_MODEL);
        configuration.setDictionaryPath(DICTIONARY_PATH);
        configuration.setLanguageModelPath(LANGUAGE_MODEL);
    
        StreamSpeechRecognizer lmRecognizer = new StreamSpeechRecognizer(configuration);
    
        InputStream stream = new FileInputStream(new File("/Users/ha/NetBeansProjects/Dialog/WAV/sample1.wav"));
    
        lmRecognizer.startRecognition(stream);
        SpeechResult result;    
    
        while ((result = lmRecognizer.getResult()) != null)
        {
            System.out.println("You said: " + result.getHypothesis() + '\n');
        } /* else
        {
            System.out.println("There is no stream.");  
        } */
    
        lmRecognizer.stopRecognition();
    
    }
    

    }

  4. after run the output is:

    run: Loading models... Apr 16, 2015 2:04:10 PM edu.cmu.sphinx.linguist.acoustic.UnitManager getUnit INFO: CI Unit: *+NSN+ Apr 16, 2015 2:04:10 PM edu.cmu.sphinx.linguist.acoustic.UnitManager getUnit INFO: CI Unit: *+SPN+ Apr 16, 2015 2:04:10 PM edu.cmu.sphinx.linguist.acoustic.UnitManager getUnit INFO: CI Unit: AA Apr 16, 2015 2:04:10 PM edu.cmu.sphinx.linguist.acoustic.UnitManager getUnit INFO: CI Unit: AE Apr 16, 2015 2:04:10 PM edu.cmu.sphinx.linguist.acoustic.UnitManager getUnit INFO: CI Unit: AH Apr 16, 2015 2:04:10 PM edu.cmu.sphinx.linguist.acoustic.UnitManager getUnit INFO: CI Unit: AO Apr 16, 2015 2:04:10 PM edu.cmu.sphinx.linguist.acoustic.UnitManager getUnit INFO: CI Unit: AW Apr 16, 2015 2:04:10 PM edu.cmu.sphinx.linguist.acoustic.UnitManager getUnit INFO: CI Unit: AY Apr 16, 2015 2:04:10 PM edu.cmu.sphinx.linguist.acoustic.UnitManager getUnit INFO: CI Unit: B Apr 16, 2015 2:04:10 PM edu.cmu.sphinx.linguist.acoustic.UnitManager getUnit INFO: CI Unit: CH Apr 16, 2015 2:04:10 PM edu.cmu.sphinx.linguist.acoustic.UnitManager getUnit INFO: CI Unit: D Apr 16, 2015 2:04:10 PM edu.cmu.sphinx.linguist.acoustic.UnitManager getUnit INFO: CI Unit: DH Apr 16, 2015 2:04:10 PM edu.cmu.sphinx.linguist.acoustic.UnitManager getUnit INFO: CI Unit: EH Apr 16, 2015 2:04:10 PM edu.cmu.sphinx.linguist.acoustic.UnitManager getUnit INFO: CI Unit: ER Apr 16, 2015 2:04:10 PM edu.cmu.sphinx.linguist.acoustic.UnitManager getUnit INFO: CI Unit: EY Apr 16, 2015 2:04:10 PM edu.cmu.sphinx.linguist.acoustic.UnitManager getUnit INFO: CI Unit: F Apr 16, 2015 2:04:10 PM edu.cmu.sphinx.linguist.acoustic.UnitManager getUnit INFO: CI Unit: G Apr 16, 2015 2:04:10 PM edu.cmu.sphinx.linguist.acoustic.UnitManager getUnit INFO: CI Unit: HH Apr 16, 2015 2:04:10 PM edu.cmu.sphinx.linguist.acoustic.UnitManager getUnit INFO: CI Unit: IH Apr 16, 2015 2:04:10 PM edu.cmu.sphinx.linguist.acoustic.UnitManager getUnit INFO: CI Unit: IY Apr 16, 2015 2:04:10 PM edu.cmu.sphinx.linguist.acoustic.UnitManager getUnit INFO: CI Unit: JH Apr 16, 2015 2:04:10 PM edu.cmu.sphinx.linguist.acoustic.UnitManager getUnit INFO: CI Unit: K Apr 16, 2015 2:04:10 PM edu.cmu.sphinx.linguist.acoustic.UnitManager getUnit INFO: CI Unit: L Apr 16, 2015 2:04:10 PM edu.cmu.sphinx.linguist.acoustic.UnitManager getUnit INFO: CI Unit: M Apr 16, 2015 2:04:10 PM edu.cmu.sphinx.linguist.acoustic.UnitManager getUnit INFO: CI Unit: N Apr 16, 2015 2:04:10 PM edu.cmu.sphinx.linguist.acoustic.UnitManager getUnit INFO: CI Unit: NG Apr 16, 2015 2:04:10 PM edu.cmu.sphinx.linguist.acoustic.UnitManager getUnit INFO: CI Unit: OW Apr 16, 2015 2:04:10 PM edu.cmu.sphinx.linguist.acoustic.UnitManager getUnit INFO: CI Unit: OY Apr 16, 2015 2:04:10 PM edu.cmu.sphinx.linguist.acoustic.UnitManager getUnit INFO: CI Unit: P Apr 16, 2015 2:04:10 PM edu.cmu.sphinx.linguist.acoustic.UnitManager getUnit INFO: CI Unit: R Apr 16, 2015 2:04:10 PM edu.cmu.sphinx.linguist.acoustic.UnitManager getUnit INFO: CI Unit: S Apr 16, 2015 2:04:10 PM edu.cmu.sphinx.linguist.acoustic.UnitManager getUnit INFO: CI Unit: SH Apr 16, 2015 2:04:10 PM edu.cmu.sphinx.linguist.acoustic.UnitManager getUnit INFO: CI Unit: T Apr 16, 2015 2:04:10 PM edu.cmu.sphinx.linguist.acoustic.UnitManager getUnit INFO: CI Unit: TH Apr 16, 2015 2:04:10 PM edu.cmu.sphinx.linguist.acoustic.UnitManager getUnit INFO: CI Unit: UH Apr 16, 2015 2:04:10 PM edu.cmu.sphinx.linguist.acoustic.UnitManager getUnit INFO: CI Unit: UW Apr 16, 2015 2:04:10 PM edu.cmu.sphinx.linguist.acoustic.UnitManager getUnit INFO: CI Unit: V Apr 16, 2015 2:04:10 PM edu.cmu.sphinx.linguist.acoustic.UnitManager getUnit INFO: CI Unit: W Apr 16, 2015 2:04:10 PM edu.cmu.sphinx.linguist.acoustic.UnitManager getUnit INFO: CI Unit: Y Apr 16, 2015 2:04:10 PM edu.cmu.sphinx.linguist.acoustic.UnitManager getUnit INFO: CI Unit: Z Apr 16, 2015 2:04:10 PM edu.cmu.sphinx.linguist.acoustic.UnitManager getUnit INFO: CI Unit: ZH Apr 16, 2015 2:04:11 PM edu.cmu.sphinx.frontend.AutoCepstrum initDataProcessors INFO: Cepstrum component auto-configured as follows: autoCepstrum {MelFrequencyFilterBank, Denoise, DiscreteCosineTransform2, Lifter} Apr 16, 2015 2:04:11 PM edu.cmu.sphinx.linguist.dictionary.TextDictionary allocate INFO: Loading dictionary from: file:models/language/TAR0779/0779.dic Apr 16, 2015 2:04:12 PM edu.cmu.sphinx.linguist.dictionary.TextDictionary allocate INFO: Loading filler dictionary from: jar:file:/Users/ha/Downloads/sphinx4-data-1.0-20150223.210601-7-sources.jar!/edu/cmu/sphinx/models/en-us/en-us/noisedict Apr 16, 2015 2:04:12 PM edu.cmu.sphinx.linguist.acoustic.tiedstate.Sphinx3Loader logInfo INFO: Loading tied-state acoustic model from: jar:file:/Users/ha/Downloads/sphinx4-data-1.0-20150223.210601-7-sources.jar!/edu/cmu/sphinx/models/en-us/en-us Apr 16, 2015 2:04:12 PM edu.cmu.sphinx.linguist.acoustic.tiedstate.Pool logInfo INFO: Pool means Entries: 16128 Apr 16, 2015 2:04:12 PM edu.cmu.sphinx.linguist.acoustic.tiedstate.Pool logInfo INFO: Pool variances Entries: 16128 Apr 16, 2015 2:04:12 PM edu.cmu.sphinx.linguist.acoustic.tiedstate.Pool logInfo INFO: Pool transition_matrices Entries: 42 Apr 16, 2015 2:04:12 PM edu.cmu.sphinx.linguist.acoustic.tiedstate.Pool logInfo INFO: Pool senones Entries: 5126 Apr 16, 2015 2:04:12 PM edu.cmu.sphinx.linguist.acoustic.tiedstate.GaussianWeights logInfo INFO: Gaussian weights: mixture_weights. Entries: 15378 Apr 16, 2015 2:04:12 PM edu.cmu.sphinx.linguist.acoustic.tiedstate.Pool logInfo INFO: Pool senones Entries: 5126 Apr 16, 2015 2:04:12 PM edu.cmu.sphinx.linguist.acoustic.tiedstate.Sphinx3Loader logInfo INFO: Context Independent Unit Entries: 42 Apr 16, 2015 2:04:12 PM edu.cmu.sphinx.linguist.acoustic.tiedstate.HMMManager logInfo INFO: HMM Manager: 137095 hmms Apr 16, 2015 2:04:12 PM edu.cmu.sphinx.linguist.acoustic.tiedstate.TiedStateAcousticModel logInfo INFO: CompositeSenoneSequences: 0 Apr 16, 2015 2:04:13 PM edu.cmu.sphinx.linguist.acoustic.HMMPool dumpInfo INFO: Max CI Units 43 Apr 16, 2015 2:04:13 PM edu.cmu.sphinx.linguist.acoustic.HMMPool dumpInfo INFO: Unit table size 79507 Apr 16, 2015 2:04:13 PM edu.cmu.sphinx.util.TimerPool showTimesShortTitle INFO: # ----------------------------- Timers---------------------------------------- Apr 16, 2015 2:04:13 PM edu.cmu.sphinx.util.TimerPool showTimesShortTitle INFO: # Name Count CurTime MinTime MaxTime AvgTime TotTime
    Apr 16, 2015 2:04:13 PM edu.cmu.sphinx.util.Timer showTimesShort INFO: Load AM 1 3.0410s 3.0410s 3.0410s 3.0410s 3.0410s
    Apr 16, 2015 2:04:13 PM edu.cmu.sphinx.util.Timer showTimesShort INFO: Load Dictionary 1 0.0520s 0.0520s 0.0520s 0.0520s 0.0520s
    Apr 16, 2015 2:04:13 PM edu.cmu.sphinx.util.Timer showTimesShort INFO: Compile 1 1.8290s 1.8290s 1.8290s 1.8290s 1.8290s
    Apr 16, 2015 2:04:17 PM edu.cmu.sphinx.instrumentation.SpeedTracker showAudioUsage INFO: This Time Audio: 0.95s Proc: 3.15s Speed: 3.32 X real time Apr 16, 2015 2:04:17 PM edu.cmu.sphinx.instrumentation.SpeedTracker showAudioSummary INFO: Total Time Audio: 0.95s Proc: 3.15s 3.32 X real time Apr 16, 2015 2:04:17 PM edu.cmu.sphinx.instrumentation.MemoryTracker calculateMemoryUsage INFO: Mem Total: 212.50 Mb Free: 70.12 Mb Apr 16, 2015 2:04:17 PM edu.cmu.sphinx.instrumentation.MemoryTracker calculateMemoryUsage INFO: Used: This: 142.38 Mb Avg: 142.38 Mb Max: 142.38 Mb You said: WHAT IS

    Apr 16, 2015 2:04:20 PM edu.cmu.sphinx.instrumentation.SpeedTracker showAudioUsage INFO: This Time Audio: 0.96s Proc: 2.45s Speed: 2.55 X real time Apr 16, 2015 2:04:20 PM edu.cmu.sphinx.instrumentation.SpeedTracker showAudioSummary INFO: Total Time Audio: 1.91s Proc: 5.60s 2.93 X real time Apr 16, 2015 2:04:20 PM edu.cmu.sphinx.instrumentation.MemoryTracker calculateMemoryUsage INFO: Mem Total: 237.00 Mb Free: 141.00 Mb Apr 16, 2015 2:04:20 PM edu.cmu.sphinx.instrumentation.MemoryTracker calculateMemoryUsage INFO: Used: This: 96.00 Mb Avg: 119.19 Mb Max: 142.38 Mb You said: MANY MEN

    Apr 16, 2015 2:04:21 PM edu.cmu.sphinx.instrumentation.SpeedTracker showAudioUsage INFO: This Time Audio: 1429182208.00s Proc: 1.19s Speed: 0.00 X real time Apr 16, 2015 2:04:21 PM edu.cmu.sphinx.instrumentation.SpeedTracker showAudioSummary INFO: Total Time Audio: 1429182208.00s Proc: 6.79s 0.00 X real time Apr 16, 2015 2:04:21 PM edu.cmu.sphinx.instrumentation.MemoryTracker calculateMemoryUsage INFO: Mem Total: 247.50 Mb Free: 144.35 Mb Apr 16, 2015 2:04:21 PM edu.cmu.sphinx.instrumentation.MemoryTracker calculateMemoryUsage INFO: Used: This: 103.15 Mb Avg: 113.84 Mb Max: 142.38 Mb You said: MANY

    Apr 16, 2015 2:04:21 PM edu.cmu.sphinx.util.TimerPool showTimesShortTitle INFO: # ----------------------------- Timers---------------------------------------- Apr 16, 2015 2:04:21 PM edu.cmu.sphinx.util.TimerPool showTimesShortTitle INFO: # Name Count CurTime MinTime MaxTime AvgTime TotTime
    Apr 16, 2015 2:04:21 PM edu.cmu.sphinx.util.Timer showTimesShort INFO: Load AM 1 3.0410s 3.0410s 3.0410s 3.0410s 3.0410s
    Apr 16, 2015 2:04:21 PM edu.cmu.sphinx.util.Timer showTimesShort INFO: Load Dictionary 1 0.0520s 0.0520s 0.0520s 0.0520s 0.0520s
    Apr 16, 2015 2:04:21 PM edu.cmu.sphinx.util.Timer showTimesShort INFO: Score 586 0.0000s 0.0000s 0.2270s 0.0031s 1.8140s
    Apr 16, 2015 2:04:21 PM edu.cmu.sphinx.util.Timer showTimesShort INFO: Prune 2043 0.0000s 0.0000s 0.0020s 0.0000s 0.0280s
    Apr 16, 2015 2:04:21 PM edu.cmu.sphinx.util.Timer showTimesShort INFO: Grow 2051 0.0000s 0.0000s 0.9200s 0.0025s 5.1330s
    Apr 16, 2015 2:04:21 PM edu.cmu.sphinx.util.Timer showTimesShort INFO: Frontend 298 0.0000s 0.0000s 0.2100s 0.0009s 0.2640s
    Apr 16, 2015 2:04:21 PM edu.cmu.sphinx.util.Timer showTimesShort INFO: Compile 1 1.8290s 1.8290s 1.8290s 1.8290s 1.8290s
    Apr 16, 2015 2:04:21 PM edu.cmu.sphinx.instrumentation.SpeedTracker showAudioSummary INFO: Total Time Audio: 1429182208.00s Proc: 6.79s 0.00 X real time Apr 16, 2015 2:04:21 PM edu.cmu.sphinx.instrumentation.MemoryTracker calculateMemoryUsage INFO: Mem Total: 247.50 Mb Free: 141.87 Mb Apr 16, 2015 2:04:21 PM edu.cmu.sphinx.instrumentation.MemoryTracker calculateMemoryUsage INFO: Used: This: 105.63 Mb Avg: 111.79 Mb Max: 142.38 Mb BUILD SUCCESSFUL (total time: 28 seconds)

The correct result should be: what is the minimum salary.

my wav file is: https://www.mediafire.com/?khgyc9bhltz0z3b

How can I improve the accuracy of my wav file?

Thanks in advance

Upvotes: 1

Views: 667

Answers (1)

Nikolay Shmyrev
Nikolay Shmyrev

Reputation: 25220

private static final String ACOUSTIC_MODEL = "models/acoustic/wsj";

This is wrong, you need to use default en-us model

I have deleted a lot of lines of missing a phonetic transcription for words in my corpus

The corpus must be a text file, not RTF file. You need to try to create language model and dictionary again.

Upvotes: 1

Related Questions