Reputation: 349
I have checked previous related threads, but did not solve my issue. I have written code to get NER from text.
text = "Stallone jason's film Rocky was inducted into the National Film Registry as well as having its film props placed in the Smithsonian Museum."
tokenized = nltk.word_tokenize(text)
tagged = nltk.pos_tag(tokenized)
namedEnt = nltk.ne_chunk(tagged, binary = True)
print namedEnt
namedEnt = nltk.ne_chunk(tagged, binary = False)
which gives this short of result
(S
(NE Stallone/NNP)
jason/NN
's/POS
film/NN
(NE Rocky/NNP)
was/VBD
inducted/VBN
into/IN
the/DT
(NE National/NNP Film/NNP Registry/NNP)
as/IN
well/RB
as/IN
having/VBG
its/PRP$
film/NN
props/NNS
placed/VBN
in/IN
the/DT
(NE Smithsonian/NNP Museum/NNP)
./.)
while I expect only NE as a result, like
Stallone
Rockey
National Film Registry
Smithsonian Museum
how to achieve this?
UPDATE
result = ' '.join([y[0] for y in x.leaves()]) for x in namedEnt.subtrees() if x.node == "NE"
print result
gives syntext error, what is correct way to write this?
UPDATE2
text = "Stallone jason's film Rocky was inducted into the National Film Registry as well as having its film props placed in the Smithsonian Museum."
tokenized = nltk.word_tokenize(text)
tagged = nltk.pos_tag(tokenized)
namedEnt = nltk.ne_chunk(tagged, binary = True)
print namedEnt
np = [' '.join([y[0] for y in x.leaves()]) for x in namedEnt.subtrees() if x.node == "NE"]
print np
error:
np = [' '.join([y[0] for y in x.leaves()]) for x in namedEnt.subtrees() if x.node == "NE"]
File "/usr/local/lib/python2.7/dist-packages/nltk/tree.py", line 198, in _get_node
raise NotImplementedError("Use label() to access a node label.")
NotImplementedError: Use label() to access a node label.
so I tried with
np = [' '.join([y[0] for y in x.leaves()]) for x in namedEnt.subtrees() if x.label() == "NE"]
which gives emtpy result
Upvotes: 1
Views: 1725
Reputation: 3325
The namedEnt
returned is actually a Tree
object which is a subclass of list
. You can do the following to parse it:
[' '.join([y[0] for y in x.leaves()]) for x in namedEnt.subtrees() if x.node == "NE"]
Output:
['Stallone', 'Rocky', 'National Film Registry', 'Smithsonian Museum']
The binary
flag is set to True
will indicate only whether a subtree is NE or not, which is what we need above. When set to False
it will give more information like whether the NE is an Organization, Person etc. For some reason, the result with flag On and Off don't seem to agree with one another.
Upvotes: 3