portsample
portsample

Reputation: 2112

Improving accuracy of speech recognition using Vosk (Kaldi) running on Android

I am developing an application to collect data in the field on Android devices using speech recognition. There are five "target words", as well as several numbers (zero, one, ten, one-hundred, etc) that are recognized.

I have improved accuracy of the target words by adding homonyms (homophones) as well as vernacular synonyms. Target words are Chinook, sockeye, coho, pink, and chum. This is the relevant code,

 public void parseWords() {
    List<String> szlNumbers = Arrays.asList(new String[]{"ONE", "TEN", "ONE HUNDRED", "ONE THOUSAND", "TEN THOUSAND"});
    //species with phonemes and vernacular names
    List<String> szlChinook = Arrays.asList("CHINOOK", "CHINOOK SALMON", "KING", "KINGS", "KING SALMON", "KING SALMAN");
    List<String> szlSockeye = Arrays.asList("SOCKEYE", "SOCCER", "SOCKEYE SALMON", "SOCK ICE", "SOCCER ICE", "SOCK I SAID", "SOCCER IS", "OKAY SALMON", "RED SALMON", "READ SALMON", "RED", "REDS");
    List<String> szlCoho = Arrays.asList("COHO", "COHO SALMON", "COVER SALMON", "SILVER SALMON", "SILVER", "SILVERS", "CO", "KOBO", "GO HOME", "COMO", "COVER", "GO");
    List<String> szlPink = Arrays.asList("PINK", "A PINK", "PINKS", "PINK SALMON", "HANK SALMON", "EXAMINE", "HUMPY", "HOBBY", "HUMPIES", "HUM BE", "HUM P", "BE", "HUMPTY", "HOBBIES", "HUMVEE", "THE HUMVEES", "POMPEY");
    List<String> szlChum = Arrays.asList("CHUM", "JOHN", "JUMP", "SHARMA", "CHARM", "COME", "CHARM SALMON", "COME SALMON", "CHUM SALMON", "JUMP SALMON", "TRUMP SALMON", "KETA SALMON", "KETA", "DOG", "DOGS", "DOG SALMON", "GATOR", "GATORS", "CALICO", "A CALICO");

    //Collections.sort(szlChinook); //what is this?
    szVoskOutput=szVoskOutput.toUpperCase();

    if (szVoskOutput.compareTo("")==0){
        //do nothing, this is a blank string
        return;
    }
    if(szVoskOutput==null){//...and this is a null string
        return;
    }
    //pink
    if (szlPink.contains(szVoskOutput)) {
        szSpecies = "Pink";
        populateSpecies();
        return;
    }
    //chum
    if (szlChum.contains(szVoskOutput)) {
        szSpecies = "Chum";
        populateSpecies();
        return;
    }
    //sockeye
    if (szlSockeye.contains(szVoskOutput)) {
        szSpecies = "Sockeye";
        populateSpecies();
        return;
    }
    //coho
    if (szlCoho.contains(szVoskOutput)) {
        szSpecies = "Coho";
        populateSpecies();
        return;
    }
    //Chinook
    if (szlChinook.contains(szVoskOutput)) {
        szSpecies = "Chinook";
        populateSpecies();
        return;
    }
    if(szlNumbers.contains(szVoskOutput)) {//then this is a number, put in count txt box
        tvCount.setText(szVoskOutput);
       return;
    }else{
            Toast.makeText(this, "Please repeat clearly. Captured string is:" + szVoskOutput, Toast.LENGTH_SHORT).show();
    }
}//end parseWords()

I have a streamlined version of the application with source code on GitHub: https://github.com/portsample/salmonTalkerLite as well as the latest full version on Google Play: https://play.google.com/store/apps/details?id=net.blepsias.salmontalker

Using the target word and homonyms, I can get hits in four to five seconds. I would like to make this faster. What can I do to further tune for speed?

Upvotes: 0

Views: 3093

Answers (1)

portsample
portsample

Reputation: 2112

This helped out significantly. Recognition time is now consistently about 1.5 seconds.

  private void recognizeMicrophone() {
    if (speechService != null) {
        setUiState(iSTATE_DONE);
        speechService.stop();
        speechService = null;
    } else {
        setUiState(iSTATE_MIC);
        try {
        Recognizer rec = new Recognizer(model, 16000.f, "[\"sockeye pink coho chum chinook atlantic salmon\","[unk]"]");
            speechService = new SpeechService(rec, 16000.0f);
            speechService.startListening(this);
        } catch (IOException e) {
            setErrorState(e.getMessage());
        }
    }
}

This clears out the upstream extraineous Vosk output leaving only specified target words. This will eliminate the need for the elaborate homonym sorting conditionals shown in the original post. Thanks to Nickolay Shmyrev for this. I am still looking for other methods to speed recognition up, or otherwise improve this process.

Updates and improvements will be reflected in source code on GitHub: https://github.com/portsample/salmonTalkerLite

Upvotes: 0

Related Questions