Reputation: 1088
I have few sentences like below
what is the sales org for fpc 1234 for IS?
give me sales org for fpc 12234 for IS?
give me sales org for fpc 12234 with scope ME?
In the above sentences, the entity I'm looking for is IS
, IS
and ME
respectively. These entities include, IS
, ME
, AN
, AM
which are common while constructing a sentence in English. I'm using LUIS
for entity detection and maintaining the entities as a list entity. The issue is that, though LUIS
is able to detect the entities (IS
,AN
,AM
) its detecting them on the normal sentences like
what is the sales org for fpc 1234
In the above sentence, we do not have any entity but the entity IS
is picked up.
How do we detect the entities only if they're addressed actually and not a part of sentence construction.
Few points to note:
give me sales org for fpc 12234 for IS?
ME,IS
do not occur twice and cannot be used to create a rule.LUIS
but entity extraction in general. I'm looking at POS
tagging as well but that needs the entity to be present in capital letter to identify it as a Noun, which may not be the case always.
Upvotes: 2
Views: 333
Reputation: 12264
You've probably figured out that non-machine-learned entities are not ideal in your case because they don't take context into consideration. I think you have a few options.
I just tested by adding your three utterances to an intent named "Sales org" and then creating a simple entity named "Scope." I labeled IS, IS, and ME at the ends of those utterances as the Scope entity. LUIS was then able to correctly identify "is" as the entity but not "me" when I tested "give me sales org for fpc 12234 for is?"
After making a call to LUIS, your bot code can then validate the recognized entity to make sure it's within the list of acceptable values.
If you still want to use a list entity, you can still have LUIS give you contextual information about the entity by using roles.
I just tested by creating an entity named "ScopeName" with your four values IS, ME, AN, and AM. I then created two roles for that entity: "scope" and "falsePositive." Then I labeled the entities in the "Sales org" utterances like this:
If you do this, LUIS will still recognize IS, ME, AN, and AM when they're in the parts of the sentence where you don't want them to be recognized, but you'll know to ignore them in your bot code because they'll be assigned the "falsePositive" role.
Upvotes: 1
Reputation: 322
As you say, a correct parse of the sentence will give you PoS tags which will help you get the right answers. Unfortunately, the examples you show have poor grammar, so even the best parsers may struggle.
Do you have enough curated data to train a neural net? An LSTM might manage to learn enough about the grammar actually used in these sentences to do the NER successfully.
In the examples you give, the names to find are all a single token. If that is typical, it will make the job easier.
The comment below says that there is not enough data to train a neural net. The few examples given are very stereotyped. Is it possible to train a Naive Bayes classifier using previous and next token as the predictors?
Upvotes: 1