Phys
Phys

Reputation: 518

Is it possible to get classes on the WordNet dataset?

I am playing with WordNet and try to solve a NLP task.

I was wondering if there exists any way to get a list of words belonging to some large sets, such as "animals" (i.e. dog, cat, cow etc.), "countries", "electronics" etc.

I believe that it should be possible to somehow get this list by exploiting hypernyms.

Bonus question: do you know any other way to classify words in very large classes, besides "noun", "adjective" and "verb"? For example, classes like, "prepositions", "conjunctions" etc.

Upvotes: 1

Views: 762

Answers (2)

Phys
Phys

Reputation: 518

With some help from polm23, I found this solution, which exploits similarity between words, and prevents wrong results when the class name is ambiguous. The idea is that WordNet can be used to compare a list words, with the string animal, and compute a similarity score. From the nltk.org webpage:

Wu-Palmer Similarity: Return a score denoting how similar two word senses are, based on the depth of the two senses in the taxonomy and that of their Least Common Subsumer (most specific ancestor node).

def keep_similar(words, similarity_thr):
    similar_words=[]
    w2 = wn.synset('animal.n.01')

    [similar_words.append(word) for word in words if wn.synset(word + '.n.01').wup_similarity(w2) > similarity_thr ]
    return similar_words

For example, if word_list = ['dog', 'car', 'train', 'dinosaur', 'London', 'cheese', 'radon'], the corresponding scores are:

0.875
0.4444444444444444
0.5
0.7
0.3333333333333333
0.3076923076923077
0.3076923076923077

This can easily be used to generate a list of animals, by setting a proper value of similarity_thr

Upvotes: 1

polm23
polm23

Reputation: 15623

Yes, you just check if the category is a hypernym of the given word.

from nltk.corpus import wordnet as wn

def has_hypernym(word, category):
    # Assume the category always uses the most popular sense
    cat_syn = wn.synsets(category)[0]

    # For the input, check all senses
    for syn in wn.synsets(word):
        for match in syn.lowest_common_hypernyms(cat_syn):
            if match == cat_syn:
                return True
    return False

has_hypernym('dog', 'animal') # => True
has_hypernym('bucket', 'animal') # => False

If the broader word (the "category" here) is the lowest common hypernym, that means it's a direct hypernym of the query word, so the query word is in the category.

Regarding your bonus question, I have no idea what you mean. Maybe you should look at NER or open a new question.

Upvotes: 2

Related Questions