Reputation: 4349

Finding individual matched phrases in SpeechRecognizedEventArgs for Microsoft speech recognition

I don't seem to be able to extract the information I want from a SpeechRecognizedEventArgs. My grammar has the phrases "one" and "left arrow". If I say both, of right after the other, my recogniser finds them in the grammar be because I have a Max repetition of five, but I can't distinguish the phases in the result. The SpeechRecognizedEventArgstext is "one left arrow" when I want "one, left arrow" or a list where the first item is "one" and the second is "left arrow".

I've found a "Words" property which is nearly what I want but not quite. If style for a way to make them comma separated or some event for when any single phase from the grammar is find so I get them one by one instead of in an inseparable group. Some of my code:

    var cultureInfo = new System.Globalization.CultureInfo("en-US");
    recognizer_ = new SpeechRecognitionEngine(cultureInfo);
    var choices = LoadWordChoices();
    var gb = new GrammarBuilder();
    gb.Append(choices, 1, 5);
    var grammar = new Grammar(gb);
    recognizer_.LoadGrammar(grammar);
    recognizer_.SpeechRecognized +=
        new EventHandler<SpeechRecognizedEventArgs>(recognizer_SpeechRecognized);

And the event:

    void recognizer_SpeechRecognized(object sender, SpeechRecognizedEventArgs e)
    {
        // e.g. "one left arrow"
        // when I'd like either "one,left arrow" or a list
        Console.WriteLine(e.Result.Text); 
        // ...
    }

edit - Attempt using Semantics works when I utter a single phrase e.g. "left arrow", but when I say "one left arrow", it crashes with the following error: "An unhandled exception of type 'System.Reflection.TargetInvocationException' occurred in mscorlib.dll. Additional information: Exception has been thrown by the target of an invocation." Here's my attempt:

        var gb = new GrammarBuilder();
        var choices = new Choices();
        var words = LoadWords(); // string[] of "one", "left arrow" etc.
        foreach (var word in words)
        {
            choices.Add(new SemanticResultValue(word, word));
        }
        gb.Append(choices, 1, 5);
        return gb;

Edit 2: Including a minimum working program to reproduce error:

class MySpeech
{
    private SpeechRecognitionEngine recognizer_;

    public MySpeech()
    {
        var cultureInfo = new System.Globalization.CultureInfo("en-US");
        recognizer_ = new SpeechRecognitionEngine(cultureInfo);
        var gb = CreateGrammarBuilder();
        var grammar = new Grammar(gb);
        recognizer_.LoadGrammar(grammar);

        recognizer_.SpeechRecognized +=
            new EventHandler<SpeechRecognizedEventArgs>(recognizer_SpeechRecognized);
        recognizer_.SetInputToDefaultAudioDevice();
        recognizer_.RecognizeAsync(RecognizeMode.Multiple);
    }

    private GrammarBuilder CreateGrammarBuilder()
    {
        var gb = new GrammarBuilder();
        var choices = new Choices();
        var words = new string[] { "one", "left arrow" };
        foreach (var word in words)
        {
            choices.Add(new SemanticResultValue(word, word));
        }
        var gbChoices = new GrammarBuilder(choices);
        var key = new SemanticResultKey("press", gbChoices);
        gb.Append(key, 1, 5);
        return gb;
    }

    void recognizer_SpeechRecognized(object sender, SpeechRecognizedEventArgs e)
    {
        Console.WriteLine("Recognized: " + e.Result.Text);
    }
}

Upvotes: 2

Answers (2)

Jan Van Overbeke

Reputation: 280

I got rid of the TargetInvokationException, simply by adding a dictationgrammar.

_speech.LoadGrammar(new Grammar(new Choices(commands)) { Name = "commands" });
_speech.LoadGrammar(new DictationGrammar() { Name = "_background" });

In the SpeechRecognized event, you can check if the result comes from your commands dictionary.

SpeechRecognized += (object sender, SpeechRecognizedEventArgs e) =>
  {
    if (e.Result.Grammar.Name == "commands")
    {
      // command recognized
    }
    else
    {
      // "background noise"
    }
  };

The result is: no more crashes and very accurate and stable command (or in your case perhaps individual words?) recognition.

Upvotes: 1

Eric Brown

Reputation: 13942

You want to be using SemanticResultKey and SemanticResultValue objects in your grammar, and then you can use e.Result.Semantics to extract the various results.

The SemanticValue is a dictionary, the values can also be SemanticValues, resulting in a tree of values.

Note that SemanticResultValues have to be associated with SemanticResultKeys.

    var gb = new GrammarBuilder();
    var choices = new Choices();
    var words = LoadWords(); // string[] of "one", "left arrow" etc.
    foreach (var word in words)
    {
        choices.Add(new SemanticResultValue(word, word));
    }
    var gbchoices = new GrammarBuilder(choices);
    var key = new SemanticResultKey("words", gbchoices);

    gb.Append(key, 1, 5);  // use implicit conversion from SemanticResultKey to GrammarBuilder

Upvotes: 4

Finding individual matched phrases in SpeechRecognizedEventArgs for Microsoft speech recognition

Answers (2)

Related Questions