NLTK to Identify Names

Question

I am attepmting to extract names with the nltk python module.

import nltk
#!pip install svgling

nltk.download('punkt')
nltk.download('averaged_perceptron_tagger')
nltk.download('maxent_ne_chunker')
nltk.download('words')

import nltk

from nltk import ne_chunk, pos_tag, word_tokenize
from nltk.tree import Tree

text = "Elon Musk 889-888-8888 elonpie@tessa.net Jeff Bezos (345)123-1234 bezzi@zonbi.com Reshma Saujani example.email@email.com 888-888-8888 Barkevious Mingo"

nltk_results = ne_chunk(pos_tag(word_tokenize(text)))
for nltk_result in nltk_results:
    if type(nltk_result) == Tree:
        name = ''
        for nltk_result_leaf in nltk_result.leaves():
            name += nltk_result_leaf[0] + ' '
        print ('Type: ', nltk_result.label(), 'Name: ', name)

The output I get from the following code above is as follows:

Type:  PERSON Name:  Elon 
Type:  GPE Name:  Musk 
Type:  PERSON Name:  Jeff Bezos 
Type:  ORGANIZATION Name:  Barkevious Mingo

This is not correct. First of all, Some names are broken up. Farily common ones, too, like Elon Musk. Next, all names are not identified. The desired output would be:

Type:  PERSON Name:  Elon Musk
Type:  PERSON Name:  Jeff Bezos
Type:  PERSON Name:  Reshma Saujani 
Type:  PERSON Name:  Barkevious Mingo

Is there a better option in python?

NLTK to Identify Names

Answers (1)

Related Questions