Zia
Zia

Reputation: 409

How print only the string result of the chunking with NLTK?

I'm using NLTK and RegEx to analyze my text. The model correctly identifies the chunk that I defined but in the end, all tagged words and "My_Chunk" show up in the print results. The question is how can I print only the chunked part of the text ("My_Chunk")?

Here are my code example:

import re
import nltk

text = ['The absolutely kind professor asked students out whom he met in class']

for item in text:
    tokenized = nltk.word_tokenize(item)
    tagged = nltk.pos_tag(tokenized)

    chunk = r"""My_Chunk: {<RB.?>*<NN.?>*<VBD.?>}"""
    chunkParser = nltk.RegexpParser(chunk)

    chunked = chunkParser.parse(tagged)
    print(chunked)
    chunked.draw()

And the print result is :

(S
  The/DT
  (My_Chunk absolutely/RB kind/NN professor/NN asked/VBD)
  students/NNS
  out/RP
  whom/WP
  he/PRP
  (Chunk met/VBD)
  in/IN
  class/NN)

Upvotes: 1

Views: 689

Answers (1)

DBaker
DBaker

Reputation: 2139

This should do it:

for a in chunked:
    if isinstance(a, nltk.tree.Tree):
        if a.label() == "My_Chunk":
            print(a)
            print(" ".join([lf[0] for lf in a.leaves()]))
            print()

#(My_Chunk absolutely/RB kind/NN professor/NN asked/VBD)
#absolutely kind professor asked

#(My_Chunk met/VBD)
#met


Upvotes: 2

Related Questions