Vijay
Vijay

Reputation: 1088

entity detection - entities clashing with english words

I have few sentences like below

In the above sentences, the entity I'm looking for is IS, IS and ME respectively. These entities include, IS, ME, AN, AM which are common while constructing a sentence in English. I'm using LUIS for entity detection and maintaining the entities as a list entity. The issue is that, though LUIS is able to detect the entities (IS,AN,AM) its detecting them on the normal sentences like

In the above sentence, we do not have any entity but the entity IS is picked up.

How do we detect the entities only if they're addressed actually and not a part of sentence construction.

Few points to note:

Upvotes: 2

Views: 333

Answers (2)

Kyle Delaney
Kyle Delaney

Reputation: 12264

You've probably figured out that non-machine-learned entities are not ideal in your case because they don't take context into consideration. I think you have a few options.

Option 1: Simple Entities

I just tested by adding your three utterances to an intent named "Sales org" and then creating a simple entity named "Scope." I labeled IS, IS, and ME at the ends of those utterances as the Scope entity. LUIS was then able to correctly identify "is" as the entity but not "me" when I tested "give me sales org for fpc 12234 for is?"

After making a call to LUIS, your bot code can then validate the recognized entity to make sure it's within the list of acceptable values.

Option 2: Roles

If you still want to use a list entity, you can still have LUIS give you contextual information about the entity by using roles.

I just tested by creating an entity named "ScopeName" with your four values IS, ME, AN, and AM. I then created two roles for that entity: "scope" and "falsePositive." Then I labeled the entities in the "Sales org" utterances like this:

enter image description here

If you do this, LUIS will still recognize IS, ME, AN, and AM when they're in the parts of the sentence where you don't want them to be recognized, but you'll know to ignore them in your bot code because they'll be assigned the "falsePositive" role.

Upvotes: 1

chrishmorris
chrishmorris

Reputation: 322

As you say, a correct parse of the sentence will give you PoS tags which will help you get the right answers. Unfortunately, the examples you show have poor grammar, so even the best parsers may struggle.

Do you have enough curated data to train a neural net? An LSTM might manage to learn enough about the grammar actually used in these sentences to do the NER successfully.

In the examples you give, the names to find are all a single token. If that is typical, it will make the job easier.

The comment below says that there is not enough data to train a neural net. The few examples given are very stereotyped. Is it possible to train a Naive Bayes classifier using previous and next token as the predictors?

Upvotes: 1

Related Questions