modarwish
modarwish

Reputation: 495

How can I retrieve the antonym synset of a target synset in NLTK's Wordnet?

I have successfully retrieved synsets connected to a base synset via other semantic relations, as follows:

 wn.synset('good.a.01').also_sees()
 Out[63]: 
 [Synset('best.a.01'),
 Synset('better.a.01'),
 Synset('favorable.a.01'),
 Synset('good.a.03'),
 Synset('obedient.a.01'),
 Synset('respectable.a.01')]

wn.synset('good.a.01').similar_tos()
Out[64]: 
[Synset('bang-up.s.01'),
 Synset('good_enough.s.01'),
 Synset('goodish.s.01'),
 Synset('hot.s.15'),
 Synset('redeeming.s.02'),
 Synset('satisfactory.s.02'),
 Synset('solid.s.01'),
 Synset('superb.s.02'),
 Synset('well-behaved.s.01')]

However, the antonym relation seems different. I managed to retrieve the lemma connected to my base synset, but was not able to retrieve the actual synset, like so:

wn.synset('good.a.01').lemmas()[0].antonyms()
Out[67]: [Lemma('bad.a.01.bad')]

How can I get the synset, and not the lemma, that is connected via antonymy to my base synset - wn.synset('good.a.01') ? TIA

Upvotes: 2

Views: 876

Answers (1)

alvas
alvas

Reputation: 122082

For some reason, WordNet indexes antonymy relations at the Lemma level instead of the Synset (see http://wordnetweb.princeton.edu/perl/webwn?o2=&o0=1&o8=1&o1=1&o7=&o5=&o9=&o6=&o3=&o4=&s=good&i=8&h=00001000000000000000000000000000#c), so the question is whether Synsets and Lemmas have many-to-many or one-to-one relations.


In the case of ambiguous words, one word many meaning, we have a one-to-many relation between String-to-Synset, e.g.

>>> wn.synsets('dog')
[Synset('dog.n.01'), Synset('frump.n.01'), Synset('dog.n.03'), Synset('cad.n.01'), Synset('frank.n.02'), Synset('pawl.n.01'), Synset('andiron.n.01'), Synset('chase.v.01')]

In the case of one meaning/concept, multiple representation, we have a one-to-many relation between Synset-to-String (where String refers to Lemma names):

>>> dog = wn.synset('dog.n.1')
>>> dog.definition()
u'a member of the genus Canis (probably descended from the common wolf) that has been domesticated by man since prehistoric times; occurs in many breeds'
>>> dog.lemma_names()
[u'dog', u'domestic_dog', u'Canis_familiaris']

Note: up till now, we are comparing the relationships between String and Synsets not Lemmas and Synsets.


The "cute" thing is that Lemma and String has a one-to-one relationship:

>>> wn.synsets('dog')
[Synset('dog.n.01'), Synset('frump.n.01'), Synset('dog.n.03'), Synset('cad.n.01'), Synset('frank.n.02'), Synset('pawl.n.01'), Synset('andiron.n.01'), Synset('chase.v.01')]
>>> wn.synsets('dog')[0]
Synset('dog.n.01')
>>> wn.synsets('dog')[0].definition()
u'a member of the genus Canis (probably descended from the common wolf) that has been domesticated by man since prehistoric times; occurs in many breeds'
>>> wn.synsets('dog')[0].lemmas()
[Lemma('dog.n.01.dog'), Lemma('dog.n.01.domestic_dog'), Lemma('dog.n.01.Canis_familiaris')]
>>> wn.synsets('dog')[0].lemmas()[0]
Lemma('dog.n.01.dog')
>>> wn.synsets('dog')[0].lemmas()[0].name()
u'dog'

The _name property of a Lemma object returns a unicode string, not a list. From the code points: https://github.com/nltk/nltk/blob/develop/nltk/corpus/reader/wordnet.py#L202 and https://github.com/nltk/nltk/blob/develop/nltk/corpus/reader/wordnet.py#L444

And it seems like the Lemma has a one-to-one relation with Synset. From docstring at https://github.com/nltk/nltk/blob/develop/nltk/corpus/reader/wordnet.py#L220:

Lemma attributes, accessible via methods with the same name::

  • name: The canonical name of this lemma.
  • synset: The synset that this lemma belongs to.
  • syntactic_marker: For adjectives, the WordNet string identifying the syntactic position relative modified noun. See: http://wordnet.princeton.edu/man/wninput.5WN.html#sect10 For all other parts of speech, this attribute is None.
  • count: The frequency of this lemma in wordnet.

So we can do this and somehow know that each Lemma object is only going to return us 1 synset:

>>> wn.synsets('dog')[0].lemmas()[0]
Lemma('dog.n.01.dog')
>>> wn.synsets('dog')[0].lemmas()[0].synset()
Synset('dog.n.01')

Assuming that you are trying to do some sentiment analysis and you need the antonyms of every adjective in WordNet, you can easily do this to accept the Synsets of the antonyms:

>>> from nltk.corpus import wordnet as wn
>>> all_adj_in_wn = wn.all_synsets(pos='a')
>>> def get_antonyms(ss):
...     return set(chain(*[[a.synset() for a in l.antonyms()] for l in ss.lemmas()]))
...
>>> for ss in all_adj_in_wn:
...     print ss, ':', get_antonyms(ss)
... 
Synset('unable.a.01') : set([Synset('unable.a.01')])

Upvotes: 1

Related Questions