Reputation: 1753
I do not understand why I am getting this error. Please help
>>> mylist = []
>>> file1 = open("medDict.txt", "r")
>>> for line in file1:
from nltk.corpus import wordnet
print line
wordFromList2 = wordnet.synsets(line)[0]
mylist.append(wordFromList2)
abnormal
Traceback (most recent call last):
File "<pyshell#10>", line 4, in <module>
wordFromList2 = wordnet.synsets(line)[0]
IndexError: list index out of range
medDict.txt contains the below words
abnormal
acne
ache
diarrhea
fever
Upvotes: 0
Views: 272
Reputation: 122270
@Blender was right about whitespace sensitivity for word.synsets()
. If you need to access any synsets
that have whitespace in natural language, Wordnet uses the underscore _
instead of . E.g. if you want to find something like
kick the bucket
you access the synsets from the NLTK WN interface with wn.synsets("kick_the_bucket")
>>> from nltk.corpus import wordnet as wn
>>> wn.synsets('kick the bucket')
[]
>>> wn.synsets('kick_the_bucket')
[Synset('die.v.01')]
However, do note that sometimes WordNet has encoded some synset with dashes instead of underscore. E.g. 9-11
is accessible but 9_11
isn't.
>>> wn.synsets('9-11')
[Synset('9/11.n.01')]
>>> wn.synsets('9_11')
[]
Now to resolve your problems with your code.
1. When you read a file line by line, you also read the invisible but existing \n
in the line. So you need to change this:
>>> mylist = []
>>> file1 = open("medDict.txt", "r")
to this:
>>> words_from_file = [i.strip() for i in open("medDict.txt", "r")]
2. I'm not very sure you really want wordnet.synsets(word)[0]
, this means you only take the first sense, do note that it might not be the Most Frequent Sense (MFS)
. So instead of doing this:
>>> wordFromList2 = wordnet.synsets(line)[0]
>>> mylist.append(wordFromList2)
I think the more appropriate way is to use a set
instead and then update
the set
>>> list_of_synsets = set()
>>> for i in words_from_file:
>>> list_of_synsets.update(wordnet.synsets(i))
>>> print list_of_synsets
Upvotes: 1
Reputation: 298532
word.synsets()
is whitespace-sensitive:
>>> wordnet.synsets('abnormal')
[Synset('abnormal.a.01'), Synset('abnormal.a.02'), Synset('abnormal.s.03')]
>>> wordnet.synsets(' abnormal')
[]
.strip()
the whitespace from your line before passing it in:
wordFromList2 = wordnet.synsets(line.strip())[0]
Upvotes: 0