Reputation: 33

how to remove the word which is not the name of person

I want to get the name of person from text file i use the nltk it returns name as well as the word which is not name:

def extract_names(text):
    tokens = nltk.tokenize.word_tokenize(text)
    pos = pos_tag(tokens)
    sentt = ne_chunk(pos, binary = False)
    person_list = []
    person = []
    name = ""
    for subtree in sentt.subtrees(filter=lambda t: t.label() == 'PERSON'):
        for leaf in subtree.leaves():
            person.append(leaf[0])
        if len(person) > 1: #avoid grabbing lone surnames
            for part in person:
                name += part + ' '
            name = remove_useless_name(name)
            if name[:-1] not in person_list:
                person_list.append(name[:-1])
            name = ''
        person = []

    return person_list

i want to remove that word which is not name which method should i use for removing the word. Input like

"Sunder Pichai"
"View Profile"
"Risk Management"

sample output:

"Sunder Pichai"

Upvotes: 1

Answers (2)

DYZ

Reputation: 57085

NLTK provides the corpora of the most common English words (nltk.corpus.words.words('en')) and most common English names (nltk.corpus.names.words()). Unfortunately, the latter one would not have Sunder or Pichai, so you have to rely on the former. Unfortunately again, there are names that are also common English words (e.g., Hope), which makes the task even more challenging. You can still automate it to some extent:

words = set(nltk.corpus.words.words('en'))

def isname1(string):
    return any([w not in words for w in string.lower().split()])

def isname2(string):
    return all([w not in words for w in string.lower().split()])

list(map(isname1, ["Sunder Pichai", "View Profile", "Risk Management"]))
#[True, False, False]
list(map(isname2, ["Sunder Pichai", "View Profile", "Risk Management"]))
#[False, False, False]

As you can see, the second function is more aggressive and does not recognize "Sunder Pichai" as a name (because "sunder" is actually an English word).

Upvotes: 2

lezsakdomi

Reputation: 125

Maybe use a dictionary, and check whether all parts of the name is a real word and/or surname is a known name

Upvotes: 0

how to remove the word which is not the name of person

Answers (2)

Related Questions