abhi1489
abhi1489

Reputation: 207

Named Entity Recognition using NLTK: Extract Auditor name, address and organisation

I am trying to use nltk to identify Person, Organization and Place from a sentence.

My Use Case is to basically extract Auditor name, organization and Place from an annual financial report

With nltk in python the results don't seem to be really satisfactory

import nltk
from nltk.tokenize import word_tokenize
from nltk.tag import pos_tag

ex='Alastair John Richard Nuttall (Senior statutory auditor) for and on behalf of Ernst & Young LLP (Statutory auditor) Leeds'

ne_tree = ne_chunk(pos_tag(word_tokenize(ex)))

print(ne_tree)

Tree('S', [Tree('PERSON', [('Alastair', 'NNP')]), Tree('PERSON', [('John', 'NNP'), ('Richard', 'NNP'), ('Nuttall', 'NNP')]), ('(', '('), Tree('ORGANIZATION', [('Senior', 'NNP')]), ('statutory', 'NNP'), ('auditor', 'NN'), (')', ')'), ('for', 'IN'), ('and', 'CC'), ('on', 'IN'), ('behalf', 'NN'), ('of', 'IN'), Tree('GPE', [('Ernst', 'NNP')]), ('&', 'CC'), Tree('PERSON', [('Young', 'NNP'), ('LLP', 'NNP')]), ('(', '('), ('Statutory', 'NNP'), ('auditor', 'NN'), (')', ')'), ('Leeds', 'NNS')])

As seen above 'Leeds' is not identified as place nor is Ernst & Young LLP recognized as Organization

Are there any better ways of achieving this in Python?

Upvotes: 0

Views: 606

Answers (1)

aab
aab

Reputation: 11474

Try spacy instead of NLTK:

https://spacy.io/usage/linguistic-features#named-entities

I think spacy's pretrained models are likely to perform better. The results (with spacy 2.1, en_core_web_lg) for your sentence are:

Alastair John Richard Nuttall PERSON
Ernst & Young LLP ORG
Leeds GPE

Upvotes: 1

Related Questions