Sahil Kamboj
Sahil Kamboj

Reputation: 418

How to get sentence after chunking in NLTK?

I have a sentence as follow:

txt =  "i am living in the West Bengal and my brother live in New York. My name is John Smith"

What I need is:

  1. Get the Chunks With GPE/location as labels and combine these chunks using "_"
  2. Get the Chunks With PERSON label and remove those chunks.

Output I needed:

preprocessed_txt =  "i am living in the West_Bengal and my brother live in New_York. My name is "

I use code from NLTK Named Entity recognition to a Python list to get the labels of the chunks.

import nltk
for sent in nltk.sent_tokenize(sentence):
   for chunk in nltk.ne_chunk(nltk.pos_tag(nltk.word_tokenize(sent))):
      if hasattr(chunk, 'label'):
         print(chunk.label(), '_'.join(c[0] for c in chunk))

This returned me the output as:

LOCATION West_Bengal
GPE New_York
PERSON John_Smith

What to do next?

Upvotes: 0

Views: 740

Answers (1)

popeye
popeye

Reputation: 886

This should be all you need:

new = list()
for chunk in nltk.ne_chunk(nltk.pos_tag(tokens)):
  try:
    if chunk.label().lower() == 'person':
      continue
    else:
      new.append('_'.join(c[0] for c in chunk))

  except AttributeError:
    new.append(chunk[0])

print(' '.join(new))

Upvotes: 1

Related Questions