Reputation: 746
I am trying to implement a voice-cue system for a client where they can assign a word or a phrase to a slide in PowerPoint, and when they speak that word or phrase, the slide advances. Here is the code I am using to create the grammar (I use Microsoft's SpeechRecognitionEngine
for the actual work).
Choices choices = new Choices();
string word = speechSlide.Scenes[speechSlide.currentslide].speechCue;
if (word.Trim() != "")
{
choices.Add(word);
GrammarBuilder builder = new GrammarBuilder(choices);
Grammar directions = new Grammar(builder);
return directions;
}
I tried raising the threshold for the confidence, however I still get too many false positives. Is there a way to improve the grammar? Something tells me that adding only one word to the grammar acceptance list is what is provoking all the false positives.
Upvotes: 3
Views: 2729
Reputation: 746
Here is what I came up with:
As @Michael Levy said, the computer doesn't do much work when you give it one word to listen for. It basically just listens for when the audio levels hit a certain value, then assumes it must be that word. So I decided that I must give it other words that SOUND opposite. Now my goal was not to spend weeks research phonetics and figure out a perfect algorithm to determine words that sound far away from the word I am trying to match, so I decided to focus on the first letter. Here is the order of operations:
Now to determine the opposite letters, I posted a question here, but it got shut down before I got any useful advice ): I don't know why, I checked the FAQ and it seems I was in the terms described there. I decided to poll my family and friends, and our combined brainpower came up with a list of opposites. Each letter has 3 letters that sound the furthers away from the original letter sound as possible.
The last step was to find words for each of these letters. I found four words per letter, for a total of 104 words. I wanted words of varying length, second letter, and end sound, so that I could cover all my bases and "distract" the computer away from the target word as much as possible. I used this University Vocab List to come up with big words, and used my puny English-mind to come up with words <5 letters, and in the end I felt I had a good list. I formatted it in XML, added the parsing code, and checked the results..... Much better! Almost too good! No false positives, and somebody with poor articulation will have a hard time using my program! I will make it a little easier, perhaps by removing the number of distraction words, but overall I was very pleased with the results, and appreciate the suggestions by @Michael Levy and @Kevin Junghans
Code:
<?xml version="1.0" encoding="utf-8" ?>
<list>
<a opposite="m,q,n">abnegate,apple,argent,axe</a>
<b opposite="k,l,s">berate,barn,bored,battology</b>
<c opposite="v,r,j">chrematophobia,cremate,cease,camoflauge</c>
<d opposite="l,q,w">dyslogy,distemper,dog,dilligent</d>
<e opposite="j,n,k">exoteric,esoteric,enumerate,elongate</e>
<f opposite="g,i,t">flagitious,flatulate,fart,funeral</f>
<g opposite="f,v,z">gracile,grace,garner,guns</g>
<h opposite="q,d,x">hebetate,health,habitat,horned</h>
<i opposite="m,n,f">isomorphic,inside,iterate,ill</i>
<j opposite="c,e,x">jape,juvenescent,jove,jolly</j>
<k opposite="l,w,v">kinetosis,keratin,knack,kudos</k>
<l opposite="b,d,g">lactate,lord,limaceous,launder</l>
<m opposite="v,i,f">malaria,mere,morbid,murcid</m>
<n opposite="h,r,v">name,nemesis,noon,nuncheon</n>
<o opposite="b,n,j">orarian,opiate,opossum,oculars</o>
<p opposite="n,m,d">pharmacist,phylogeny,pelt,puny</p>
<q opposite="d,h,f">query,quack,quick,quisquous</q>
<r opposite="c,f,x">random,renitency,roinous,run</r>
<s opposite="b,y,d">sand,searing,sicarian,solemn,</s>
<t opposite="l,m,f">tart,treating,thunder,thyroid</t>
<u opposite="f,g,j">unasinous,unit,ulcer,unthinkable</u>
<v opposite="c,k,m">version,visceral,vortex,vulnerable</v>
<w opposite="d,k,n">wand,weasiness,whimsical,wolf</w>
<x opposite="m,l,p">xanthopsia,xanthax,xylophone,xray</x>
<y opposite="s,j,d">yellow,york,yuck,ylem</y>
<z opposite="m,n,g">zamboni,zip,zoology,zugzwang </z>
</list>
Parsing code:
private Dictionary<string, List<string>> opposites;
private Dictionary<string, List<string>> words = new Dictionary<string, List<string>>();
private void StartSpeechRecognition(Media_Slide slide)
{
if (opposites == null)
{
opposites = new Dictionary<string, List<string>>();
System.Xml.XmlDocument doc = new System.Xml.XmlDocument();
string file = System.IO.Path.GetDirectoryName(Assembly.GetAssembly(typeof(MainWindow)).CodeBase).Remove(0, 6) + "\\buzzlist.xml";
doc.Load(file);
foreach (System.Xml.XmlNode node in doc.ChildNodes[1].ChildNodes)
{
opposites.Add(node.Name, new List<string>(node.Attributes[0].InnerText.Split(',')));
words.Add(node.Name, new List<string>(node.InnerText.Split(',')));
}
}
speechSlide = slide;
rec = new SpeechRecognitionEngine();
rec.SpeechRecognized += rec_SpeechRecognized;
rec.SetInputToDefaultAudioDevice();
try
{
rec.LoadGrammar(GetGrammar());
rec.RecognizeAsync(RecognizeMode.Multiple);
}
catch
{
}
}
Checking code:
void rec_SpeechRecognized(object sender, SpeechRecognizedEventArgs e)
{
if (e.Result.Text == speechSlide.Scenes[speechSlide.currentslide].speechCue)
{
rec.UnloadAllGrammars();
ScreenSettings.NextSlide(speechSlide);
try
{
rec.LoadGrammar(GetGrammar());
}
catch
{
rec.RecognizeAsyncCancel();
}
}
}
Upvotes: 3
Reputation: 13297
Recognizer results can vary based on many factors. These include: background noise, microphone quality, and audio input settings and levels. Try a quiet room with a good microphone and see if your results are better.
Your theory of a one-word-grammar causing problems may be fair. (It reminds me of a teacher asking a multiple choice question on a test with only one choice, then being surprised when so many students got the answer correct.) Have you tried adding in junk words as other choices in the grammar so that the engine won't just default to the one and only choice? Try something like:
choices.Add("zebra");
choices.Add("umbrella");
choices.Add("plunger");
and see if your results improve.
I know in Windows 7 with the Dictation grammar, you can use the Windows 7 Speech Recognition features to train the recognizer to better recognize a single speaker. I don't know if this helps you with a fixed grammar as you've described. You may want to experiment with training to see if the results improve. See http://windows.microsoft.com/en-US/windows7/Set-up-Speech-Recognition for more info.
Upvotes: 3