akrama81
akrama81

Reputation: 351

Nltk's wordnet lemmatizer not lemmatizing all words

I'm trying to lemmatize words in a text. Like for example 'pickled' should turn to 'pickle', 'ran' to 'run', 'raisins' to 'raisin' etc.

I'm using nltk's WordNet Lemmatizer as follows:

from nltk.stem import WordNetLemmatizer
>>> 
>>> lem = WordNetLemmatizer()
>>> print(lem.lemmatize("cats"))
cat
>>> print(lem.lemmatize("pickled"))
pickled
>>> print(lem.lemmatize("ran"))
ran

So, as you can see for 'pickled' and 'ran', the output isn't coming as expected. How to get 'pickle' and 'run' for these without having to specify 'v' (verb) etc. for the words.

Upvotes: 2

Views: 4484

Answers (1)

Sriram Sitharaman
Sriram Sitharaman

Reputation: 857

You can get the base form of lemmatize() function for a noun or a verb by getting the most common result of the function among passing a 'v' or 'n' parameter and not passing anything.

Not a direct way to do but you can try the following code for getting the base form of a noun or a verb:

def most_common(lst):
    return max(set(lst), key=lst.count)
words = ['ran','pickled','cats',"crying","died","raisins","had"]
for word in words:
    checkList=[WordNetLemmatizer().lemmatize(word,'v'),WordNetLemmatizer().lemmatize(word,'n'),WordNetLemmatizer().lemmatize(word,'n')]
    print most_common(checkList)

You get the base form :

ran
pickled
cat
cry
died
raisin
had

Upvotes: 2

Related Questions