Reputation: 1
This is the python code I have written to NER(named entity recognition) a use-case scenario that is input by the user as a given text using Jupiter notebook.
First of all, I have written a code to input the scenario as text.
text = "customer must be registered if wants to buy the product.unregistered user can’t go to
the shopping cart. Customer logins to the system by entering valid user id and password for
the shopping. customer can make order or cancel order of the product from the shopping cart
after login or registration. Customer has to logout after ordering or surfing for the product "
As the next step, I have to get it into a string.
text_combined = str(text)
Second I put it into a doc.
doc = nlp(text_combined)
Then I have written the NER code. I have put a screenshot of the output.
for ent in doc.ents:
print(ent.text,ent.label_)
Finally, I am expected the entities like the customer is a person. But the code is identified that as an organization. (screenshot is attached) Can you explain to me why is that? And is there anyone to solve this problem?
spacy.displacy.render(doc, style='ent',jupyter=True)
Upvotes: 1
Views: 990
Reputation: 15593
The spaCy models are trained on newspaper-like texts. Some of the labels they have are things like PER (Person) and ORG (Organization). But it learns what these are based on newspaper articles. So if you have a news article like this...
John Smith of Eggplant Limited reported a new product today...
Then it would be labelled like this:
[John Smith PER] of [Eggplant Limited ORG] reported a new product today...
So the named entities are proper nouns.
In your example "Customer" is not a proper noun, so there's no reason it would be tagged as PER. It's a little weird that it's tagged as ORG, and I'd consider that an error. As to why it has an error there, it's hard to say exactly, but models aren't perfect and they do have errors, so you have to be able to deal with issues like that in your application.
Upvotes: 1