Reputation: 409
I'm using NLTK and RegEx to analyze my text. The model correctly identifies the chunk that I defined but in the end, all tagged words and "My_Chunk" show up in the print results. The question is how can I print only the chunked part of the text ("My_Chunk")?
Here are my code example:
import re
import nltk
text = ['The absolutely kind professor asked students out whom he met in class']
for item in text:
tokenized = nltk.word_tokenize(item)
tagged = nltk.pos_tag(tokenized)
chunk = r"""My_Chunk: {<RB.?>*<NN.?>*<VBD.?>}"""
chunkParser = nltk.RegexpParser(chunk)
chunked = chunkParser.parse(tagged)
print(chunked)
chunked.draw()
And the print result is :
(S
The/DT
(My_Chunk absolutely/RB kind/NN professor/NN asked/VBD)
students/NNS
out/RP
whom/WP
he/PRP
(Chunk met/VBD)
in/IN
class/NN)
Upvotes: 1
Views: 689
Reputation: 2139
This should do it:
for a in chunked:
if isinstance(a, nltk.tree.Tree):
if a.label() == "My_Chunk":
print(a)
print(" ".join([lf[0] for lf in a.leaves()]))
print()
#(My_Chunk absolutely/RB kind/NN professor/NN asked/VBD)
#absolutely kind professor asked
#(My_Chunk met/VBD)
#met
Upvotes: 2