Performance of grxml spell and phonetic grammar

Question

The following grammar performs dreadfully, I was wondering if there is something wrong with the grammar itself and if yes how it can be improved,

This is with the ancient nuance 8.5, so might be the performance of the recognizer?

Using nl-tool (The equivalent of the parsetool in Nuance 9), I can see that when we are using phonetic and spell in the GUI tool, we are getting two interpretations (out), (we should only get one - not sure why?Maybe the syntax of the grammar is wrong as well..) but even just a spell that gives a single interpretation on the command line tool works terribly.

  
    
      
        
          
        
        
           start again 
          
            
              
                
                  double
                  twice
                  two times
                
                
                assign(alphanum strcat($alphanum strcat($return  $return ))))
              
              
                
                assign(alphanum strcat($alphanum $return))
              
            
          
        
        ]]>
      
      
        
          
            for
            as in
            as
            like
          
        
      
      
        
          
            
              
                
              
            
            the

     



                        letter



                            

    
            
              
                
              
              
                
              
              
                
              
            
            
              
                
              
            
          
        
         return($return)
      
      
        
          
             ay  return("a") 
             eh  return("a") 
             a  return("a") 
             be return("b") 
             bee  return("b") 
             sea  return("c") 
             see   return("c") 
            dee return("d") 
            ee return("e") 
            eff return("f") 
             ef return("f") 
            f return("f") 
             gee  return("g") 
            g  return("g") 
              h  return("h") 
             aych  return("h") 
             haych  return("h") 
             eye  return("i") 
             jay  return("j") 
              kay  return("k") 
              cay  return("k") 
             elle  return("l") 
             ell  return("l") 
             el  return("l") 
              m  return("m") 
              em  return("m")  
             in  return("n") 
             en  return("n") 
            n  return("n") 
             inn  return("n") 
             oh  return("o") 
             owe  return("o") 
              pea  return("p") 
              pee  return("p") 
             queue  return("q") 
              cue  return("q") 
              are  return("r") 
               s  return("s") 
               tea  return("t") 
              tee  return("t") 
             you  return("u") 
             vee  return("v") 
             v  return("v") 
             double you  return("w") 
             doubleyou return("w") 
             w return("w") 
             x  return("x") 
             ex  return("x") 
             ehks  return("x") 
              why  return("y") 
              z  return("z") 
             zee  return("z") 
             zed  return("z") 
          
        
      
      
        
          
             alpha  return("a") 
             alfa  return("a") 
            alice return("a") 
             bravo  return("b") 
             charlie  return("c") 
             delta  return("d") 
             echo  return("e") 
             foxtrot  return("f") 
             freddie  return("f") 
             freddy  return("f") 
             golf  return("g") 
             hotel  return("h") 
             indigo  return("i") 
             india  return("i") 
             juliet  return("j") 
             john  return("j") 
             kilo  return("j") 
            lima  return("l") 
             mike  return("m") 
             mother  return("m") 
             november  return("n") 
             oscar  return("o") 
              oliver  return("o") 
             papa  return("p") 
             pappa  return("p") 
             quebec  return("q") 
             queen  return("q") 
             romeo  return("r") 
             roger  return("r") 
             robert  return("r") 
             sierra  return("s") 
            sugar  return("s") 
             tango  return("t") 
              uniform  return("u") 
             victor  return("v") 
             whiskey  return("w") 
             william  return("w") 
             ex ray  return("x") 
             yankee  return("y") 
             yellow  return("y") 
             zulu  return("z") 
             zero  return("z") 
             zebra  return("z") 
          
        
      
      
        
          right
          alright my surname's mrs

Jim Rush · Accepted Answer

Your asking the recognition engine to perform some tasks that it just doesn't do very well. Highly variable length list of short words (letters in this case). The Nuance engine, in my experience, didn't do this very well. I'm not sure which engines, if any, available today would be better, but I haven't experimented enough. Some of the newer, speaker independent, dictation engines might have a better chance.

Some things that might help:

If there is there is some logic pattern or logic (ie words, names) behind the text, and you have enough samples, a Statistical Language Model (SLM) might fair better. Given it looks like you might be supporting a name, this is an approach I've used before. Accuracy is still significantly lower than normal grammars, but it gives you a fighting chance (I build a first name and surname capture...one as a static grammar of spelling and saying the name and the other as an SLM of just spelling the name. Both built from the same census data. Both had similar accuracies. If I used one and then used the other as a fallback, I was getting around 75% task success rates with a slightly older version of the Nuance recognition engine)
If you can get your users to use words (ie alpha) instead of letters, you increase the number of sounds that can be used to match the correct input.
Decrease the variability in the length. Not only is it difficult for the engine to separate the short sounds from noise, you'll find the the recognizer is using significantly more CPU to separate those sounds than normal, short input sounds.
If you can build native grammars and adjust native tuning parameters, you use the tuning trade-offs in the system to use more cpu and time to better recognize. For the way you've currently structured the solution, I don't think any amount of addition resources will be enough for the way that engine operates.
Remove some of the pronunciations. I suspect you aren't gaining accuracy with them, but I'd have to run samples both with and without the expanded grammar/pronunciation options.

Performance of grxml spell and phonetic grammar

Answers (1)

Related Questions