Reputation: 518
I am playing with WordNet and try to solve a NLP task.
I was wondering if there exists any way to get a list of words belonging to some large sets, such as "animals" (i.e. dog, cat, cow etc.), "countries", "electronics" etc.
I believe that it should be possible to somehow get this list by exploiting hypernyms.
Bonus question: do you know any other way to classify words in very large classes, besides "noun", "adjective" and "verb"? For example, classes like, "prepositions", "conjunctions" etc.
Upvotes: 1
Views: 762
Reputation: 518
With some help from polm23, I found this solution, which exploits similarity between words, and prevents wrong results when the class name is ambiguous.
The idea is that WordNet can be used to compare a list words
, with the string animal
, and compute a similarity score. From the nltk.org webpage:
Wu-Palmer Similarity: Return a score denoting how similar two word senses are, based on the depth of the two senses in the taxonomy and that of their Least Common Subsumer (most specific ancestor node).
def keep_similar(words, similarity_thr):
similar_words=[]
w2 = wn.synset('animal.n.01')
[similar_words.append(word) for word in words if wn.synset(word + '.n.01').wup_similarity(w2) > similarity_thr ]
return similar_words
For example, if word_list = ['dog', 'car', 'train', 'dinosaur', 'London', 'cheese', 'radon']
, the corresponding scores are:
0.875
0.4444444444444444
0.5
0.7
0.3333333333333333
0.3076923076923077
0.3076923076923077
This can easily be used to generate a list of animals, by setting a proper value of similarity_thr
Upvotes: 1
Reputation: 15623
Yes, you just check if the category is a hypernym of the given word.
from nltk.corpus import wordnet as wn
def has_hypernym(word, category):
# Assume the category always uses the most popular sense
cat_syn = wn.synsets(category)[0]
# For the input, check all senses
for syn in wn.synsets(word):
for match in syn.lowest_common_hypernyms(cat_syn):
if match == cat_syn:
return True
return False
has_hypernym('dog', 'animal') # => True
has_hypernym('bucket', 'animal') # => False
If the broader word (the "category" here) is the lowest common hypernym, that means it's a direct hypernym of the query word, so the query word is in the category.
Regarding your bonus question, I have no idea what you mean. Maybe you should look at NER or open a new question.
Upvotes: 2