jan
jan

Reputation: 211

categorize/ get hypernym type word using wordnet in python

In my project I have to find the category/hypernym type of a specific word.

For example if i type Sushi/lion, the output will show food/animal. The main concept is to categorize the word. So, how can I get this using nltk and WordNet in Python?

Upvotes: 0

Views: 2633

Answers (1)

WolfgangK
WolfgangK

Reputation: 993

I am unsure if your goal is achievable with an out-of-the-box solution since the abstraction level needed is quite high. In terms of nltk/wordnet, you are looking for the hypernym (supertype/superordinate) of a word. For example, the hypernym of "sushi" might be "seafood" on a first level, whereas "apple" might be just a "fruit". Probably you will have to go through several levels of hypernyms to arrive at your desired output. As a starting point to get the hypernyms, you can use this code (see All synonyms for word in python?):

from nltk.corpus import wordnet as wn
from itertools import chain

for i,j in enumerate(wn.synsets('apple')):
    print('Meaning', i, 'NLTK ID', j.name())
    print('Definition:', j.definition())
    print('Hypernyms:', ', '.join(list(chain(*[l.lemma_names() for l in j.hypernyms()]))))

Notice also that one single word can have different meanings with different hypernyms, which further complicates your task.

EDIT

Actually, there is an out-of-the-box solution to this problem called lowest_common_hypernym:

wn.synset('apple.n.01').lowest_common_hypernyms(wn.synset('sushi.n.01'))

While this function is pretty nice, it does not necessarily return the most obvious solution. Here, it returns [Synset('matter.n.03')].

Upvotes: 3

Related Questions