Reputation: 1093
I'm reading a list of sentences and tagging each word with NLTK's Stanford POS tagger. I get outputs like so:
wordnet_sense = []
for o in output:
a = st.tag(o)
wordnet_sense.append(a)
outputs: [[(u'feel', u'VB'), (u'great', u'JJ')], [(u'good', u'JJ')]]
I want to map these words with their POS, so that they are recognised in WordNet.
I've attempted this:
sense = []
for i in wordnet_sense:
tmp = []
for tok, pos in i:
lower_pos = pos[0].lower()
if lower_pos in ['a', 'n', 'v', 'r', 's']:
res = wn.synsets(tok, lower_pos)
if len(res) > 0:
a = res[0]
else:
a = "[{0}, {1}]".format(tok, pos)
tmp.append(a)
sense.append(tmp)
print sense
outputs: [Synset('feel.v.01'), '[great, JJ]'], ['[good, JJ]']]
So feel
is recognised as a verb, but great
and good
are not recognised as adjectives. I've also checked if great
and good
actually belong in Wordnet because I thought they weren't being mapped if they weren't there, but they are. Can anyone help?
Upvotes: 5
Views: 3696
Reputation: 1093
def wordnet_pos_code(tag):
if tag.startswith('NN'):
return wn.NOUN
elif tag.startswith('VB'):
return wn.VERB
elif tag.startswith('JJ'):
return wn.ADJ
elif tag.startswith('RB'):
return wn.ADV
else:
return ''
print wordnet_pos_code('NN')`
As well as the answer provided, I've found this that also works.
Upvotes: 2