Reputation: 211
In my project I have to find the category/hypernym type of a specific word.
For example if i type Sushi/lion, the output will show food/animal. The main concept is to categorize the word. So, how can I get this using nltk and WordNet in Python?
Upvotes: 0
Views: 2633
Reputation: 993
I am unsure if your goal is achievable with an out-of-the-box solution since the abstraction level needed is quite high. In terms of nltk/wordnet, you are looking for the hypernym (supertype/superordinate) of a word. For example, the hypernym of "sushi" might be "seafood" on a first level, whereas "apple" might be just a "fruit". Probably you will have to go through several levels of hypernyms to arrive at your desired output. As a starting point to get the hypernyms, you can use this code (see All synonyms for word in python?):
from nltk.corpus import wordnet as wn
from itertools import chain
for i,j in enumerate(wn.synsets('apple')):
print('Meaning', i, 'NLTK ID', j.name())
print('Definition:', j.definition())
print('Hypernyms:', ', '.join(list(chain(*[l.lemma_names() for l in j.hypernyms()]))))
Notice also that one single word can have different meanings with different hypernyms, which further complicates your task.
EDIT
Actually, there is an out-of-the-box solution to this problem called lowest_common_hypernym
:
wn.synset('apple.n.01').lowest_common_hypernyms(wn.synset('sushi.n.01'))
While this function is pretty nice, it does not necessarily return the most obvious solution. Here, it returns [Synset('matter.n.03')]
.
Upvotes: 3