Anita
Anita

Reputation: 285

self-define lemmatized words and append to WordNetLemmatizer

I would like to append some exceptions for lemmatization results. For example, when I test out wnl.lemmatize('cookies'), the result I got is cooky instead of cookie. How can I update the lemmatization result to cookie?

import nltk
from nltk.tokenize import word_tokenize
from nltk import pos_tag
from nltk.stem import WordNetLemmatizer 
wnl = WordNetLemmatizer()

def text_cleaning(text):
  text = text.lower()
  tok_list = [wnl.lemmatize(w,tag[0].lower()) if tag[0].lower() in ['a','n','v'] else wnl.lemmatize(w) for w,tag in pos_tag(word_tokenize(text))]
return ' '.join(tok_list)

Upvotes: 0

Views: 206

Answers (1)

IanQ
IanQ

Reputation: 2109

Looking through the implementation found here you can probably do something like

class WNWrapper(WordNetLemmatizer):
    def __init__(self, custom_transforms):
        self.custom_transforms = custom_transforms

    def lemmatize(self, word):
        if word in self.custom_transforms:
            return self.custom_transforms[word]
        super().lemmatize(word)

but this only works when

1) you know which words you want to change/ not change

2) it's a small number. This obviously doesn't scale

Upvotes: 1

Related Questions