Reputation: 345
I have a list containing lists with all seperated words of a review, that looks like this:
texts = [['fine','for','a','night'],['it','was','good']]
I want to remove all stopwords, using the nltk.corpus package, and put all the words without stopwords back into the list. The end results should be a list, consisting of a lists of words without stopwords. This it was I tried:
import nltk
nltk.download() # to download stopwords corpus
from nltk.corpus import stopwords
stopwords=stopwords.words('english')
words_reviews=[]
for review in texts:
wr=[]
for word in review:
if word not in stopwords:
wr.append(word)
words_reviews.append(wr)
This code actually worked, but now I get the error: AttributeError: 'list' object has no attribute 'words', referring to stopwords. I made sure that I installed all packages. What could be the problem?
Upvotes: 0
Views: 5120
Reputation: 31
instead of
[word for word in text_tokens if not word in stopwords.words()]
use
[word for word in text_tokens if not word in all_stopwords]
Upvotes: 0
Reputation: 50190
The problem is that you redefine stopwords
in your code:
from nltk.corpus import stopwords
stopwords=stopwords.words('english')
After the first line, stopwords
is a corpus reader with a words()
method. After the second line, it is a list. Proceed accordingly.
Actually looking things up in a list is really slow, so you'll get much better performance if you use this:
stopwords = set(stopwords.words('english'))
Upvotes: 4