Athanatos
Athanatos

Reputation: 1089

Performance of grxml spell and phonetic grammar

The following grammar performs dreadfully, I was wondering if there is something wrong with the grammar itself and if yes how it can be improved,

This is with the ancient nuance 8.5, so might be the performance of the recognizer?

Using nl-tool (The equivalent of the parsetool in Nuance 9), I can see that when we are using phonetic and spell in the GUI tool, we are getting two interpretations (out), (we should only get one - not sure why?Maybe the syntax of the grammar is wrong as well..) but even just a spell that gives a single interpretation on the command line tool works terribly.

  <?xml version="1.0" encoding="UTF-8"?>
    <grammar xmlns="http://www.w3.org/2001/06/grammar" xml:lang="en-GB" version="1.0" root="TOPLEVEL" mode="voice">
      <rule id="TOPLEVEL" scope="public">
        <item repeat="0-1">
          <ruleref uri="#Preamble"/>
        </item>
        <one-of>
          <item> start again </item>
          <item repeat="2-15">
            <one-of>
              <item>
                <one-of>
                  <item>double</item>
                  <item>twice</item>
                  <item>two times</item>
                </one-of>
                <ruleref uri="#Spell_Alpha"/>
                <tag>assign(alphanum strcat($alphanum strcat($return  $return ))))</tag>
              </item>
              <item>
                <ruleref uri="#Spell_Alpha"/>
                <tag>assign(alphanum strcat($alphanum $return))</tag>
              </item>
            </one-of>
          </item>
        </one-of>
        <tag><![CDATA[<out $alphanum >]]></tag>
      </rule>
      <rule id="Prepositions" scope="public">
        <item>
          <one-of>
            <item repeat="0-1">for</item>
            <item repeat="0-1">as in</item>
            <item repeat="0-1">as</item>
            <item repeat="0-1">like</item>
          </one-of>
        </item>
      </rule>
      <rule id="Spell_Alpha" scope="public">
        <item>
          <one-of>
            <item weight="1.9">
              <item>
                <ruleref uri="#LETTERS"/>
              </item>
            </item>
            <item weight="0.6"><item repeat="0-1">the

     </item>



                        letter



                            <item repeat="1"><ruleref uri="#LETTERS"/></item>

    </item>
            <item weight="0.6">
              <item>
                <ruleref uri="#LETTERS"/>
              </item>
              <item repeat="1">
                <ruleref uri="#Prepositions"/>
              </item>
              <item>
                <ruleref uri="#PHONETICS_BASIC"/>
              </item>
            </item>
            <item>
              <item weight="2.0">
                <ruleref uri="#PHONETICS_BASIC"/>
              </item>
            </item>
          </one-of>
        </item>
        <tag> return($return)</tag>
      </rule>
      <rule id="LETTERS" scope="public">
        <item>
          <one-of>
            <item weight="1.584"> ay <tag> return("a") </tag></item>
            <item weight="1.584"> eh <tag> return("a") </tag></item>
            <item weight="1.584"> a <tag> return("a") </tag></item>
            <item weight="1.584"> be<tag> return("b") </tag></item>
            <item weight="1.166"> bee <tag> return("b") </tag></item>
            <item weight="1.222"> sea <tag> return("c") </tag></item>
            <item weight="1.222"> see  <tag> return("c") </tag></item>
            <item weight="1.229">dee<tag> return("d") </tag></item>
            <item weight="1.639">ee<tag> return("e") </tag></item>
            <item weight="1.072">eff<tag> return("f") </tag></item>
            <item weight="1.072"> ef<tag> return("f") </tag></item>
            <item weight="1.072">f<tag> return("f") </tag></item>
            <item weight="1.160"> gee <tag> return("g") </tag></item>
            <item weight="1.160">g <tag> return("g") </tag></item>
            <item weight="1.274">  h <tag> return("h") </tag></item>
            <item weight="1.274"> aych <tag> return("h") </tag></item>
            <item weight="1.274"> haych <tag> return("h") </tag></item>
            <item weight="1.384"> eye <tag> return("i") </tag></item>
            <item weight="1.040"> jay <tag> return("j") </tag></item>
            <item weight="1.146">  kay <tag> return("k") </tag></item>
            <item weight="1.146">  cay <tag> return("k") </tag></item>
            <item weight="1.459"> elle <tag> return("l") </tag></item>
            <item weight="1.459"> ell <tag> return("l") </tag></item>
            <item weight="1.459"> el <tag> return("l") </tag></item>
            <item weight="1.230">  m <tag> return("m") </tag></item>
            <item weight="1.230">  em <tag> return("m") </tag> </item>
            <item weight="1.510"> in <tag> return("n") </tag></item>
            <item weight="1.510"> en <tag> return("n") </tag></item>
            <item weight="1.510">n <tag> return("n") </tag></item>
            <item weight="1.510"> inn <tag> return("n") </tag></item>
            <item weight="1.489"> oh <tag> return("o") </tag></item>
            <item weight="1.489"> owe <tag> return("o") </tag></item>
            <item weight="1.107">  pea <tag> return("p") </tag></item>
            <item weight="1.107">  pee <tag> return("p") </tag></item>
            <item weight="1.004"> queue <tag> return("q") </tag></item>
            <item weight="1.004">  cue <tag> return("q") </tag></item>
            <item weight="1.534">  are <tag> return("r") </tag></item>
            <item weight="1.424">   s <tag> return("s") </tag></item>
            <item weight="1.331">   tea <tag> return("t") </tag></item>
            <item weight="1.331">  tee <tag> return("t") </tag></item>
            <item weight="1.139"> you <tag> return("u") </tag></item>
            <item weight="1.054"> vee <tag> return("v") </tag></item>
            <item weight="1.054"> v <tag> return("v") </tag></item>
            <item weight="1.166"> double you <tag> return("w") </tag></item>
            <item weight="1.166"> doubleyou<tag> return("w") </tag></item>
            <item weight="1.166"> w<tag> return("w") </tag></item>
            <item weight="1.010"> x <tag> return("x") </tag></item>
            <item weight="1.010"> ex <tag> return("x") </tag></item>
            <item weight="1.010"> ehks <tag> return("x") </tag></item>
            <item weight="1.147">  why <tag> return("y") </tag></item>
            <item weight="1.025">  z <tag> return("z") </tag></item>
            <item weight="1.025"> zee <tag> return("z") </tag></item>
            <item weight="1.025"> zed <tag> return("z") </tag></item>
          </one-of>
        </item>
      </rule>
      <rule id="PHONETICS_BASIC" scope="public">
        <item>
          <one-of>
            <item> alpha <tag> return("a") </tag></item>
            <item> alfa <tag> return("a") </tag></item>
            <item>alice<tag> return("a") </tag></item>
            <item> bravo <tag> return("b") </tag></item>
            <item> charlie <tag> return("c") </tag></item>
            <item> delta <tag> return("d") </tag></item>
            <item> echo <tag> return("e") </tag></item>
            <item> foxtrot <tag> return("f") </tag></item>
            <item> freddie <tag> return("f") </tag></item>
            <item> freddy <tag> return("f") </tag></item>
            <item> golf <tag> return("g") </tag></item>
            <item> hotel <tag> return("h") </tag></item>
            <item> indigo <tag> return("i") </tag></item>
            <item> india <tag> return("i") </tag></item>
            <item> juliet <tag> return("j") </tag></item>
            <item> john <tag> return("j") </tag></item>
            <item> kilo <tag> return("j") </tag></item>
            <item>lima <tag> return("l") </tag></item>
            <item> mike <tag> return("m") </tag></item>
            <item> mother <tag> return("m") </tag></item>
            <item> november <tag> return("n") </tag></item>
            <item> oscar <tag> return("o") </tag></item>
            <item>  oliver <tag> return("o") </tag></item>
            <item> papa <tag> return("p") </tag></item>
            <item> pappa <tag> return("p") </tag></item>
            <item> quebec <tag> return("q") </tag></item>
            <item> queen <tag> return("q") </tag></item>
            <item> romeo <tag> return("r") </tag></item>
            <item> roger <tag> return("r") </tag></item>
            <item> robert <tag> return("r") </tag></item>
            <item> sierra <tag> return("s") </tag></item>
            <item>sugar <tag> return("s") </tag></item>
            <item> tango <tag> return("t") </tag></item>
            <item>  uniform <tag> return("u") </tag></item>
            <item> victor <tag> return("v") </tag></item>
            <item> whiskey <tag> return("w") </tag></item>
            <item> william <tag> return("w") </tag></item>
            <item> ex ray <tag> return("x") </tag></item>
            <item> yankee <tag> return("y") </tag></item>
            <item> yellow <tag> return("y") </tag></item>
            <item> zulu <tag> return("z") </tag></item>
            <item> zero <tag> return("z") </tag></item>
            <item> zebra <tag> return("z") </tag></item>
          </one-of>
        </item>
      </rule>
      <rule id="Preamble">
        <one-of>
          <item weight="0.2">right</item>
          <item weight="0.2">alright my surname's mrs</item>
        </one-of>
      </rule>
    </grammar>

Upvotes: 1

Views: 839

Answers (1)

Jim Rush
Jim Rush

Reputation: 4163

Your asking the recognition engine to perform some tasks that it just doesn't do very well. Highly variable length list of short words (letters in this case). The Nuance engine, in my experience, didn't do this very well. I'm not sure which engines, if any, available today would be better, but I haven't experimented enough. Some of the newer, speaker independent, dictation engines might have a better chance.

Some things that might help:

  • If there is there is some logic pattern or logic (ie words, names) behind the text, and you have enough samples, a Statistical Language Model (SLM) might fair better. Given it looks like you might be supporting a name, this is an approach I've used before. Accuracy is still significantly lower than normal grammars, but it gives you a fighting chance (I build a first name and surname capture...one as a static grammar of spelling and saying the name and the other as an SLM of just spelling the name. Both built from the same census data. Both had similar accuracies. If I used one and then used the other as a fallback, I was getting around 75% task success rates with a slightly older version of the Nuance recognition engine)
  • If you can get your users to use words (ie alpha) instead of letters, you increase the number of sounds that can be used to match the correct input.
  • Decrease the variability in the length. Not only is it difficult for the engine to separate the short sounds from noise, you'll find the the recognizer is using significantly more CPU to separate those sounds than normal, short input sounds.
  • If you can build native grammars and adjust native tuning parameters, you use the tuning trade-offs in the system to use more cpu and time to better recognize. For the way you've currently structured the solution, I don't think any amount of addition resources will be enough for the way that engine operates.
  • Remove some of the pronunciations. I suspect you aren't gaining accuracy with them, but I'd have to run samples both with and without the expanded grammar/pronunciation options.

Upvotes: 1

Related Questions