Reputation: 351
I'm trying to lemmatize words in a text. Like for example 'pickled' should turn to 'pickle', 'ran' to 'run', 'raisins' to 'raisin' etc.
I'm using nltk's WordNet Lemmatizer
as follows:
from nltk.stem import WordNetLemmatizer
>>>
>>> lem = WordNetLemmatizer()
>>> print(lem.lemmatize("cats"))
cat
>>> print(lem.lemmatize("pickled"))
pickled
>>> print(lem.lemmatize("ran"))
ran
So, as you can see for 'pickled'
and 'ran'
, the output isn't coming as expected. How to get 'pickle'
and 'run'
for these without having to specify 'v'
(verb) etc. for the words.
Upvotes: 2
Views: 4484
Reputation: 857
You can get the base form of lemmatize()
function for a noun or a verb by getting the most common result of the function among passing a 'v'
or 'n'
parameter and not passing anything.
Not a direct way to do but you can try the following code for getting the base form of a noun or a verb:
def most_common(lst):
return max(set(lst), key=lst.count)
words = ['ran','pickled','cats',"crying","died","raisins","had"]
for word in words:
checkList=[WordNetLemmatizer().lemmatize(word,'v'),WordNetLemmatizer().lemmatize(word,'n'),WordNetLemmatizer().lemmatize(word,'n')]
print most_common(checkList)
You get the base form :
ran
pickled
cat
cry
died
raisin
had
Upvotes: 2