Reputation: 207
I am trying to use nltk to identify Person, Organization and Place from a sentence.
My Use Case is to basically extract Auditor name, organization and Place from an annual financial report
With nltk in python the results don't seem to be really satisfactory
import nltk
from nltk.tokenize import word_tokenize
from nltk.tag import pos_tag
ex='Alastair John Richard Nuttall (Senior statutory auditor) for and on behalf of Ernst & Young LLP (Statutory auditor) Leeds'
ne_tree = ne_chunk(pos_tag(word_tokenize(ex)))
print(ne_tree)
Tree('S', [Tree('PERSON', [('Alastair', 'NNP')]), Tree('PERSON', [('John', 'NNP'), ('Richard', 'NNP'), ('Nuttall', 'NNP')]), ('(', '('), Tree('ORGANIZATION', [('Senior', 'NNP')]), ('statutory', 'NNP'), ('auditor', 'NN'), (')', ')'), ('for', 'IN'), ('and', 'CC'), ('on', 'IN'), ('behalf', 'NN'), ('of', 'IN'), Tree('GPE', [('Ernst', 'NNP')]), ('&', 'CC'), Tree('PERSON', [('Young', 'NNP'), ('LLP', 'NNP')]), ('(', '('), ('Statutory', 'NNP'), ('auditor', 'NN'), (')', ')'), ('Leeds', 'NNS')])
As seen above 'Leeds' is not identified as place nor is Ernst & Young LLP recognized as Organization
Are there any better ways of achieving this in Python?
Upvotes: 0
Views: 606
Reputation: 11474
Try spacy instead of NLTK:
https://spacy.io/usage/linguistic-features#named-entities
I think spacy's pretrained models are likely to perform better. The results (with spacy 2.1, en_core_web_lg) for your sentence are:
Alastair John Richard Nuttall PERSON
Ernst & Young LLP ORG
Leeds GPE
Upvotes: 1