Reputation: 48
I have this code:
from nltk.corpus import stopwords
from nltk.tokenize import word_tokenize
from nltk.stem import PorterStemmer
import re
fo = open('cran.all.1400', 'r+')
contents = fo.read()
docs = re.split(r'\.I[\s][\d]*')
stop_words = set(stopwords.words('english'))
tokens = []
for each in docs:
tokens.append(word_tokenize(eac))
s_words = [w for w in tokens if not w in stop_words]
print(s_words)
When I try to run it, I get this error:
how can I solve this?
Upvotes: 0
Views: 1028
Reputation: 46
Not sure if it's related, but I'm thinking you mean [w for w in tokens if w not in stop_words]
tokens.append(word_tokenize(each))
<-- that is likely giving you a 2d array. So each
is a list. Perhaps you're expecting this to be a 1 dimensional list, in which case you could use tokens.concat(word_tokenize(each))
Upvotes: 0
Reputation: 377
It appears that your variable each
is a list, and you try to look if the latter belongs to a set
. The in
operator needs that each
is hashable in order to search for it in the set.
Upvotes: 1