Mahfuja nilufar
Mahfuja nilufar

Reputation: 337

what is time complexity of using NLTK WordNetLemmatizer

I am a little bit confused about the time complexity of using WordNetLemmatizer. From NLTK site: NLTK provides 155,287 words and 117,659 synonym sets in English WordNet.

Source code:

import nltk
from nltk.stem import WordNetLemmatizer 

# Init the Wordnet Lemmatizer
lemmatizer = WordNetLemmatizer()

# Lemmatize Single Word
print(lemmatizer.lemmatize("bats"))

From the NLTK source code, I got the following function for lemmatize() method:

def lemmatize(self, word: str, pos: str = "n") -> str:
    
    lemmas = wn._morphy(word, pos)
    return min(lemmas, key=len) if lemmas else word

In the previous code what would be time complexity? How the "wn._morphy()" function is working?

Is it like, when I am passing a word, it's searching lemmas from 155,287 words? In that case, can I say the time complexity is O(K); where K is Wordnet size (155,287 words)

Upvotes: 1

Views: 315

Answers (1)

Darren Cook
Darren Cook

Reputation: 28928

In answer to how wn._morphy() is working, here is the source code:

https://github.com/nltk/wordnet/blob/master/wn/morphy.py

As far as I can see it is O(MP), where M is the number of morphological variants it tries, and P is the number of entries in POS_LIST (i.e. P is 1 if you give an explicit POS code to try).

The actual lookup is O(1), not O(K), because the words are kept in _lemma_pos_offset_map, which is a python dictionary.

So I think O(1) is the answer you are looking for: whether Wordnet has 100,000, 155,287, or 200,000 words, the run-time of morphy() won't change.

Upvotes: 1

Related Questions