Reputation: 418
I have a sentence as follow:
txt = "i am living in the West Bengal and my brother live in New York. My name is John Smith"
What I need is:
Output I needed:
preprocessed_txt = "i am living in the West_Bengal and my brother live in New_York. My name is "
I use code from NLTK Named Entity recognition to a Python list to get the labels of the chunks.
import nltk
for sent in nltk.sent_tokenize(sentence):
for chunk in nltk.ne_chunk(nltk.pos_tag(nltk.word_tokenize(sent))):
if hasattr(chunk, 'label'):
print(chunk.label(), '_'.join(c[0] for c in chunk))
This returned me the output as:
LOCATION West_Bengal
GPE New_York
PERSON John_Smith
What to do next?
Upvotes: 0
Views: 740
Reputation: 886
This should be all you need:
new = list()
for chunk in nltk.ne_chunk(nltk.pos_tag(tokens)):
try:
if chunk.label().lower() == 'person':
continue
else:
new.append('_'.join(c[0] for c in chunk))
except AttributeError:
new.append(chunk[0])
print(' '.join(new))
Upvotes: 1