Reputation: 1753

why is the code returning IndexError error in python when the synsets of the word exists

I do not understand why I am getting this error. Please help

>>> mylist = []
>>> file1 = open("medDict.txt", "r")
>>> for line in file1:
    from nltk.corpus import wordnet
    print line
    wordFromList2 = wordnet.synsets(line)[0]
    mylist.append(wordFromList2)


abnormal


Traceback (most recent call last):
  File "<pyshell#10>", line 4, in <module>
    wordFromList2 = wordnet.synsets(line)[0]
IndexError: list index out of range

medDict.txt contains the below words

abnormal
acne
ache
diarrhea
fever

Upvotes: 0

Answers (2)

alvas

Reputation: 122270

@Blender was right about whitespace sensitivity for word.synsets(). If you need to access any synsets that have whitespace in natural language, Wordnet uses the underscore _ instead of . E.g. if you want to find something like kick the bucket you access the synsets from the NLTK WN interface with wn.synsets("kick_the_bucket")

>>> from nltk.corpus import wordnet as wn
>>> wn.synsets('kick the bucket')
[]
>>> wn.synsets('kick_the_bucket')
[Synset('die.v.01')]

However, do note that sometimes WordNet has encoded some synset with dashes instead of underscore. E.g. 9-11 is accessible but 9_11 isn't.

>>> wn.synsets('9-11')
[Synset('9/11.n.01')]
>>> wn.synsets('9_11')
[]

Now to resolve your problems with your code.

1. When you read a file line by line, you also read the invisible but existing \n in the line. So you need to change this:

>>> mylist = []
>>> file1 = open("medDict.txt", "r")

to this:

>>> words_from_file = [i.strip() for i in open("medDict.txt", "r")]

2. I'm not very sure you really want wordnet.synsets(word)[0], this means you only take the first sense, do note that it might not be the Most Frequent Sense (MFS). So instead of doing this:

>>> wordFromList2 = wordnet.synsets(line)[0]
>>> mylist.append(wordFromList2)

I think the more appropriate way is to use a set instead and then update the set

>>> list_of_synsets = set()
>>> for i in words_from_file:
>>>  list_of_synsets.update(wordnet.synsets(i))
>>> print list_of_synsets

Upvotes: 1

Blender

Reputation: 298532

word.synsets() is whitespace-sensitive:

>>> wordnet.synsets('abnormal')
    [Synset('abnormal.a.01'), Synset('abnormal.a.02'), Synset('abnormal.s.03')]
>>> wordnet.synsets(' abnormal')
    []

.strip() the whitespace from your line before passing it in:

wordFromList2 = wordnet.synsets(line.strip())[0]

Upvotes: 0

why is the code returning IndexError error in python when the synsets of the word exists

Answers (2)

Related Questions