Reputation: 431
I'm working on a personal project involving microphones in my apartment that I can issue verbal commands to. To accomplish this, I've been using the Microsoft Speech API, and specifically RecognitionEngine from System.Speech.Recognition in C#. I construct a grammar as follows:
// validCommands is a Choices object containing all valid command strings
// recognizer is a RecognitionEngine
GrammarBuilder builder = new GrammarBuilder(recognitionSystemName);
builder.Append(validCommands);
recognizer.SetInputToDefaultAudioDevice();
recognizer.LoadGrammar(new Grammar(builder));
recognizer.RecognizeAsync(RecognizeMode.Multiple);
// etc ...
This seems to work pretty well for the case when I actually give it a command. It hasn't misidentified one of my commands yet. Unfortunately, it also tends to pick up random talking as commands! I've tried to ameliorate this by prefacing the command Choices object with a "name" (recognitionSystemName), which I address the system as. Oddly, this doesn't seem to help. I am restricting it to a set of predetermined command phrases, so I would have thought that it would be able to detect if speech wasn't any of the strings. My best guess is that it's assuming that all sound is a command and picking the best match from the command set. Any advice on improving this system so that it no longer triggers off of conversation not directed at it would be very helpful.
Edit: I've moved the name recognizer to a separate SpeechRecognitionEngine, but the accuracy is awful. Here's a bit of test code I wrote to examine the accuracy:
using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;
using System.Threading.Tasks;
using System.Speech.Recognition;
namespace RecognitionAccuracyTest
{
class RecognitionAccuracyTest
{
static int recogcount;
[STAThread]
static void Main()
{
recogcount = 0;
System.Console.WriteLine("Beginning speech recognition accuracy test.");
SpeechRecognitionEngine recognizer;
recognizer = new SpeechRecognitionEngine(new System.Globalization.CultureInfo("en-US"));
recognizer.SetInputToDefaultAudioDevice();
recognizer.LoadGrammar(new Grammar(new GrammarBuilder("Octavian")));
recognizer.SpeechHypothesized += new EventHandler<SpeechHypothesizedEventArgs>(recognizer_SpeechHypothesized);
recognizer.SpeechRecognized += new EventHandler<SpeechRecognizedEventArgs>(recognizer_SpeechRecognized);
recognizer.RecognizeAsync(RecognizeMode.Multiple);
while (true) ;
}
static void recognizer_SpeechRecognized(object sender, SpeechRecognizedEventArgs e)
{
System.Console.WriteLine("Recognized @ " + e.Result.Confidence);
try
{
if (e.Result.Audio != null)
{
System.IO.FileStream stream = new System.IO.FileStream("audio" + ++recogcount + ".wav", System.IO.FileMode.Create);
e.Result.Audio.WriteToWaveStream(stream);
stream.Close();
}
}
catch (Exception) { }
}
static void recognizer_SpeechHypothesized(object sender, SpeechHypothesizedEventArgs e)
{
System.Console.WriteLine("Hypothesized @ " + e.Result.Confidence);
}
}
}
If the name is "Octavian", it recognizes stuff like "Octopus", "Octagon", "Volkswagen", and "Wow, really?". I can clearly hear the difference in the associated audio clips. Any ideas on making this not awful would be great.
Upvotes: 1
Views: 2491
Reputation: 966
Is it possible that you just need to run UnloadAllGrammars() prior to creating/loading the grammar that you want to use?
Upvotes: 0
Reputation: 1621
I'm with the same problem too. I'm using Microsoft Speech Platform, so it could be a little different in accuracy etc.
I'm using Claire as a wake-up command, but it's true that it recognizes different words as Claire too. The problem is that the engine hears you speak and search for the closest match.
I didn't found a really good solution to this. You could either try out to filter the recognized speech with the Confidence field. But it's not very reliable with my chosen recognizer engine. I just throw every word that I want to recognize in one big SRGS.xml and set the repeat value to 0-. I only accept the recognized sentence as Claire is the first word. But this solution is not what I want, as it doesn't work as good as I wish, but still it's a little improvement.
I'm currently busy with it, and I will post more info as I progress.
EDIT 1: As a comment to what Dims says: It's possible in a SRGS Grammar to add a "Garbage" rule. You might want to look into that. http://www.w3.org/TR/speech-grammar/
Upvotes: 1
Reputation: 13297
Let me make sure I understand, you want a phrase to set apart commands to the system, like "butler" or "Siri". So, you'll say "Butler, turn on TV". You can build this into your grammar.
Here is an example of a simple grammar that requires an opening phrase before it recognizes a command. It uses semantic results to help you understand what was said. In this case the user must say "Open" or "Please open" or "can you open"
private Grammar CreateTestGrammar()
{
// item
Choices item = new Choices();
SemanticResultValue itemSRV;
itemSRV = new SemanticResultValue("I E", "explorer");
item.Add(itemSRV);
itemSRV = new SemanticResultValue("explorer", "explorer");
item.Add(itemSRV);
itemSRV = new SemanticResultValue("firefox", "firefox");
item.Add(itemSRV);
itemSRV = new SemanticResultValue("mozilla", "firefox");
item.Add(itemSRV);
itemSRV = new SemanticResultValue("chrome", "chrome");
item.Add(itemSRV);
itemSRV = new SemanticResultValue("google chrome", "chrome");
item.Add(itemSRV);
SemanticResultKey itemSemKey = new SemanticResultKey("item", item);
//build the permutations of choices...
GrammarBuilder gb = new GrammarBuilder();
gb.Append(itemSemKey);
//now build the complete pattern...
GrammarBuilder itemRequest = new GrammarBuilder();
//pre-amble "[I'd like] a"
itemRequest.Append(new Choices("Can you open", "Open", "Please open"));
itemRequest.Append(gb);
Grammar TestGrammar = new Grammar(itemRequest);
return TestGrammar;
}
You can then process the speech with something like:
RecognitionResult result = myRecognizer.Recognize();
and check for semantic results like:
if(result.Semantics.ContainsKey("item"))
{
string s = (string)result.Semantics["item"].Value;
}
Upvotes: 2
Reputation: 51209
In principle, you need to update either grammar or dictionary to have "empty" or "anything" entries there.
Upvotes: 0