hengist
hengist

Reputation: 115

How to remove luis entity marker from utterance

I am using LUIS to determine which state a customer lives in. I have set up a list entity called "state" that has the 50 states with their two-letter abbreviations as synonyms as described in the documentation. LUIS is returning certain two letter words, such as "hi" or "in" as state entities.

I have set up an intent with phrases such as "My state is Oregon", "I am from WA", etc. Inside the intent, if the word "in" is included in the utterance, for example in the utterance "I live in Kentucky", the word "in" is marked automatically by LUIS as a state entity and I am unable to remove that marker.

Below is a snip of the LUIS json response to the utterance "I live in Kentucky". As you can see, the response includes both Indiana and Kentucky as entities when there should only be Kentucky.

 "query": "I live in Kentucky",
  "topScoringIntent": {
    "intent": "STATE_INQUIRY",
    "score": 0.9338141
  },
....
    "entities": [
....
    {
      "entity": "in",
      "type": "state",
      "startIndex": 7,
      "endIndex": 8,
      "resolution": {
        "values": [
          "indiana"
        ]
      }
    },
    {
      "entity": "kentucky",
      "type": "state",
      "startIndex": 10,
      "endIndex": 17,
      "resolution": {
        "values": [
          "kentucky"
        ]
      }
    }
  ], ....

How do I train LUIS not to mark the words "in" and "hi" in this context as states if I can't remove the intent marker from the utterance?

Upvotes: 2

Views: 607

Answers (2)

hengist
hengist

Reputation: 115

@StevenKanberg's answer was very helpful but unfortunately not complete for my situation. I tried to implement both geographyV2 and Places.AbsoluteLocation (separately). Neither one works entirely in the way I need it to (recognizing states and their two-letter abbrevs in a way that can be queried from the entities in the response).

So my choices are:

  1. Create my own list of states, using the state name and the two-letter abbrev as synonyms, as described in the list description itself. This works except for two letter abbrevs that are also words, such as "in", "hi" and "me".
  2. Use geographyV2 prebuilt which does not allow synonyms and does not recognize two-letter abbrevs at all, or
  3. Use Places.AbsoluteLocation which does recognize two-letter abbrevs for states, does not confuse them with words, but also grabs all locations including cities, countries and addresses and does not differentiate between them so I have no way of parsing which entity is the state in an utterance like "I live in Lake Stevens, Snohomish County, WA".

Solution: If I combine 1 with 3, I can query for entities that have both of those types. If LUIS marks the word "in" as a state (Indiana), I can then check to see if that word has also been flagged as an AbsoluteLocation. If it has not, then I can safely discard that entity. It's not ideal but is a workaround that solves the problem.

Upvotes: 3

Steven Kanberg
Steven Kanberg

Reputation: 6393

In this particular case (populating a list entity with state abbvreviations/names), you would be better served using the geographyV2 prebuilt entity or Places.AbsoluteLocation prebuilt domain entity. (Please note that at the time of this writing, the geographyV2 prebuilt entity has a slight bug, so using the prebuilt domain entity would be the better option).

The reason for this is two-fold:

One, geographic locations are already baked into LUIS and they don't collide with regular syntactic words like "in", "hi", or "me". I tested this in reverse by creating a [Medical] list that contained "ct" as the normalized value and "ct scan" as a synonym. When I typed "get me a ct in CT" it resulted in "get me a [Medical] in [Medical]". To fix, I selected the second "CT" value and re-assigned it to the Places.AbsoluteLocation entity. After retraining, I tested "when in CT show me ct options" which correctly resulted in "when in [Places.AbsoluteLocation] show me [Medical] options". Further examples and training will refine the results.

Two, lists work well for words that have disparate words that can reference one. This tutorial shows a simple example where loosely associated words are assigned as synonyms to a canonical name (normalized value).

Hope of help!

Upvotes: 3

Related Questions