Reputation: 4419
I'd like to write a simple function to see if this word 'exists' in WordNet via NLTK.
def is_known(word):
"""return True if this word "exists" in WordNet
(or at least in nltk.corpus.stopwords)."""
if word.lower() in nltk.corpus.stopwords.words('english'):
return True
synset = wn.synsets(word)
if len(synset) == 0:
return False
else:
return True
Why would words like could, since, without, although
return False? Don't they appear in WordNet? Is there any better way to find out whether a word exists in WN (using NLTK)?
My first try was to eliminate "stopwords" which are words like to, if, when, then, I, you
, but there are still very common words (like could
) which I can't find.
Upvotes: 1
Views: 2307
Reputation: 122338
You can try to extract all the lemmas in wordnet and then check against that list:
from nltk.corpus import wordnet as wn
from itertools import chain
all_lemmas = set(chain(*[i.lemma_names for i in wn.all_synsets()]))
def in_wordnet(word):
return True if word in all_lemmas else False
print in_wordnet('can')
print in_wordnet('could')
[out]:
True
False
Do note that wordnet contains lemmas and not words. Also do note that a word/lemma can be polysemous and not a really a contain word, e.g.
I can foo bar.
vs The water can is heavy
Upvotes: 0
Reputation: 11601
WordNet does not contain these words or words like them. For an explanation, see the following from the WordNet docs:
Q. Why is WordNet missing: of, an, the, and, about, above, because, etc.
A. WordNet only contains "open-class words": nouns, verbs, adjectives, and adverbs. Thus, excluded words include determiners, prepositions, pronouns, conjunctions, and particles.
You also won't find these kinds of words in the online version of WordNet.
Upvotes: 7