Reputation: 507
I am studying NLTK using a book named Natural Language Processing with Python Cookbook.
Here is the code but there was no explanation at all.
grammar = r"NAMED-ENTITY: {<NNP>+}"
cp = nltk.RegexpParser(grammar)
samplestrings = [
"Microsoft Azure is a cloud service",
"Bill Gates announces Satya Nadella as new CEO of Microsoft"
]
def demo(samplestrings):
for s in samplestrings:
words = nltk.word_tokenize(s)
tagged = nltk.pos_tag(words)
# chunks = nltk.ne_chunk(tagged)
chunks = cp.parse(tagged)
print(nltk.tree2conllstr(chunks))
print(chunks)
demo(samplestrings)
So I am stuck with the first line.
What does grammar = r"NAMED-ENTITY: {<NNP>+}"
this code do?
Does it mean that if there is more than one NNP (at least one NNP), then that tagged word is a named-entity?
Thanks for the answer
Upvotes: 1
Views: 155
Reputation: 2126
In this example they are chunking sequences of a proper noun with a regex parser named as NAMED-ENTITY.
cp = nltk.RegexpParser(r"NAMED-ENTITY: {<NNP>+}")
NNP is the part-of-speech tag for proper nouns.
Upvotes: 1