Reputation: 11
import nltk
from nltk.corpus import stopwords
from nltk.tokenize import sent_tokenize, word_tokenize
from nltk.tokenize import PunktSentenceTokenizer
from nltk.stem import WordNetLemmatizer
import re
import time
txt = input()
snt_tkn = sent_tokenize(txt)
wrd_tkn = [word_tokenize(s) for s in snt_tkn]
stp_wrd = set(stopwords.words("english"))
flt_snt = [w for w in wrd_tkn if not w in stp_wrd]
print(flt_snt)
returns the following:
Traceback (most recent call last):
File "compiler.py", line 19, in
flt_snt = [w for w in wrd_tkn if not w in stp_wrd]
File "compiler.py", line 19, in
flt_snt = [w for w in wrd_tkn if not w in stp_wrd]
TypeError: unhashable type: 'list'
I'd like to know, if possible, how to return the tokenized text with stop words removed without editing wrd_tkn
.
Upvotes: 1
Views: 375
Reputation: 398
The error says that list
is unhasahble. You might try to make it hashable, but lists are not hashable because they are mutable. Try to convert a list to a tuple that is not mutable and that is hashable. It can be done by constructor function
immutable_list = tuple(some_list)
Upvotes: 2
Reputation: 11
For future reference, the resolution is the following:
change
flt_snt = [w for w in wrd_tkn if not w in stp_wrd]
to
flt_snt = [[w for w in s if not w in stp_wrd]for s in wrd_tkn]
Upvotes: 0