Reputation: 337
I am a little bit confused about the time complexity of using WordNetLemmatizer. From NLTK site: NLTK provides 155,287 words and 117,659 synonym sets in English WordNet.
Source code:
import nltk
from nltk.stem import WordNetLemmatizer
# Init the Wordnet Lemmatizer
lemmatizer = WordNetLemmatizer()
# Lemmatize Single Word
print(lemmatizer.lemmatize("bats"))
From the NLTK source code, I got the following function for lemmatize() method:
def lemmatize(self, word: str, pos: str = "n") -> str:
lemmas = wn._morphy(word, pos)
return min(lemmas, key=len) if lemmas else word
In the previous code what would be time complexity? How the "wn._morphy()" function is working?
Is it like, when I am passing a word, it's searching lemmas from 155,287 words? In that case, can I say the time complexity is O(K); where K is Wordnet size (155,287 words)
Upvotes: 1
Views: 315
Reputation: 28928
In answer to how wn._morphy()
is working, here is the source code:
https://github.com/nltk/wordnet/blob/master/wn/morphy.py
As far as I can see it is O(MP), where M is the number of morphological variants it tries, and P is the number of entries in POS_LIST
(i.e. P
is 1 if you give an explicit POS code to try).
The actual lookup is O(1), not O(K), because the words are kept in _lemma_pos_offset_map
, which is a python dictionary.
So I think O(1)
is the answer you are looking for: whether Wordnet has 100,000, 155,287, or 200,000 words, the run-time of morphy() won't change.
Upvotes: 1