Reputation: 4349
I have a program recognizing speech quite well with System.Speech using SpeechRecognitionEngine. However, although accurate, it seems to throw away some audio input it receives. If I say, "one, two, three" with pauses between each word, it transcribes each work correctly. However, if I say them without a pause between each word, it will transcribe the first and sometimes the third word correctly. The second word is simply ignored.
Other people have had this problem, but I haven't been able to discovered their solutions. Microsoft Speech Recognition Speed
If I could I would like to set the recorder audio position to an earlier point in the audio stream but I haven't found a function in the API that would let me do this. Another approach I was considering was to have multiple recognition engines where each would attempt to take just one word and would be reused when it's finished handling that word but that's a very complex and resource hungry solution.
Any help on this problem would be appreciated.
I've cut it down to this piece of C# code:
public void Init()
{
// Create an in-process speech recognizer for the en-US locale.
var cultureInfo = new System.Globalization.CultureInfo("en-US");
recognizer_ = new SpeechRecognitionEngine(cultureInfo);
// Create and load a dictation grammar.
var numbers = new Choices();
numbers.Add(new string[] { "one", "two", "three" });
// Create a GrammarBuilder object and append the Choices object.
GrammarBuilder gb = new GrammarBuilder();
gb.Append(numbers);
var g = new Grammar(gb);
recognizer_.LoadGrammar(g);
// Add a handler for the speech recognized event.
recognizer_.SpeechRecognized +=
new EventHandler<SpeechRecognizedEventArgs>(recognizer_SpeechRecognized);
recognizer_.SpeechDetected += recognizer_SpeechDetected;
// Configure input to the speech recognizer.
recognizer_.SetInputToDefaultAudioDevice();
// Start asynchronous, continuous speech recognition.
recognizer_.RecognizeAsync(RecognizeMode.Multiple);
}
void recognizer_SpeechDetected(object sender, SpeechDetectedEventArgs e)
{
Console.WriteLine("\nspeech detected event audio position:\t\t" + e.AudioPosition);
Console.WriteLine("speech detected current audio position:\t\t" + recognizer_.AudioPosition);
Console.WriteLine("speech detected recognizer audio position:\t" + recognizer_.RecognizerAudioPosition);
}
// Handle the SpeechRecognized event.
void recognizer_SpeechRecognized(object sender, SpeechRecognizedEventArgs e)
{
Console.WriteLine("speech recognized event audio position:\t\t" + e.Result.Audio.AudioPosition);
Console.WriteLine("speech recognized event audio start time: " + e.Result.Audio.StartTime);
Console.WriteLine(e.Result.Text);
// do things
// ...
}
Upvotes: 1
Views: 2580
Reputation: 25220
Instead of
gb.Append(numbers);
Which specifies to recognize isolated numbers try something like
gb.Append(new GrammarBuilder(numbers), 1, 5);
Which will allow to recognize number sequencies up to 5 numbers. Adjust repetition count according to your needs.
Upvotes: 2