Reputation: 122052
From the NLTK WordNet API:
>>> from nltk.corpus import wordnet as wn
>>> for i in wn.synsets('discover'):
... print i, i.offset
...
Synset('detect.v.01') 2154508
Synset('learn.v.02') 598954
Synset('discover.v.03') 1637982
Synset('discover.v.04') 721437
Synset('fall_upon.v.01') 2286687
Synset('unwrap.v.02') 933821
Synset('discover.v.07') 2128066
Synset('identify.v.05') 652346
>>> wn.synset('discover.v.8')
Synset('identify.v.05')
From the index.verb
file from WN3.0, we have:
discover v 8 6 @ ~ * > $ + 8 7 02154508 00598954 01637982 00721437 02286687 00933821 02128066 00652346
I have checked the WordNet API (http://www.nltk.org/_modules/nltk/corpus/reader/wordnet.html) but there isn't much to say how the mapping from discover.v.8
to identify.v.5
.
Can anyone explain how did the mapping occur?
How can I extract a list of these mapping?
Upvotes: 0
Views: 307
Reputation: 925
I'm not sure about what your question really intends. Seems like you are not understanding why discover is linked to identify is it right? Happens that a WordNet Synset is a "SYNonym SET", so many words are listed for a single synset.
If you check wordnet through the browser (WNB) or through the online version you will see that the list of synsets you have for "discover" is simply the list of all synsets from wordnet that have the word discover on them. For some reason, internally, NLTk describes the synset with only the first word appearing on the synset list of related words.
In other words, Synset('discover.v.8')
is the same synset as Synset('identify.v.05')
, only seen from different perspectives. The 8th sense for discovery as VERB is also the 5th sense for identify as a VERB. Internally both use the same S-ID, that is where they are related.
A list of these mappings would simply be the list of Synset IDs related to the word.
Upvotes: 1