Reputation: 91
Is there a way in Python 2.7 using NLTK
to just get the word and not the extra formatting that includes "synset"
and the parentheses and the "n.01"
etc?
For instance if I do
wn.synsets('dog')
My results look like:
[Synset('dog.n.01'), Synset('frump.n.01'), Synset('dog.n.03'), Synset('cad.n.01'), Synset('frank.n.02'), Synset('pawl.n.01'), Synset('andiron.n.01'), Synset('chase.v.01')]
How can I instead get a list like this?
dog
frump
cad
frank
pawl
andiron
chase
Is there a way to do this using NLTK
or do I have to use regular expressions
? Can I use regular expressions
within a python script?
Upvotes: 7
Views: 6833
Reputation: 71
aelfric5578 you're close: attribute name is a function, not a string.
[synset.name().split('.')[0] for synset in wn.synsets('dog') ]
Upvotes: 3
Reputation: 1222
it is very simple just create a list and then get the first value
from nltk.corpus import wordnet as wn
syn=[]
for s in wn.synsets('dog'):
syn.appned(s)
return syn[0]
Upvotes: 0
Reputation: 122052
Using lemma name might work but there is a canonical variable for the synset name for the Synset
object, try:
>>> from nltk.corpus import wordnet as wn
>>> wn.synset('dog.n.1')
Synset('dog.n.01')
>>> wn.synset('dog.n.1').name
'dog.n.01'
>>> wn.synset('dog.n.1').name.partition('.')[0]
'dog'
>>> for ss in wn.synsets('dog'):
... print ss.name.partition('.')[0]
...
dog
frump
dog
cad
frank
pawl
andiron
chase
Upvotes: 0
Reputation: 237
Try this:
for synset in wn.synsets('dog'):
print synset.lemmas[0].name
You want to iterate over each synset for dog, and then print out the headword of the synset. Keep in mind that multiple words could attach to the same synset, so if you want to get all the words associated with all the synsets for dog, you could do:
for synset in wn.synsets('dog'):
for lemma in synset.lemmas:
print lemma.name
Upvotes: 4
Reputation: 1063
If you want to do this without regular expressions, you can use a list comprehension.
[synset.name.split('.')[0] for synset in wn.synsets('dog') ]
What you're doing here is saying that, for each synset return the first word before the period.
Upvotes: 4