Lisadk
Lisadk

Reputation: 345

Remove stopwords with nltk.corpus from list with lists

I have a list containing lists with all seperated words of a review, that looks like this:

texts = [['fine','for','a','night'],['it','was','good']]

I want to remove all stopwords, using the nltk.corpus package, and put all the words without stopwords back into the list. The end results should be a list, consisting of a lists of words without stopwords. This it was I tried:

import nltk
nltk.download() # to download stopwords corpus
from nltk.corpus import stopwords
stopwords=stopwords.words('english')
words_reviews=[]

for review in texts:
    wr=[]
    for word in review:
        if word not in stopwords:
            wr.append(word)
        words_reviews.append(wr)

This code actually worked, but now I get the error: AttributeError: 'list' object has no attribute 'words', referring to stopwords. I made sure that I installed all packages. What could be the problem?

Upvotes: 0

Views: 5120

Answers (3)

WARUTS
WARUTS

Reputation: 72

i removed the set , it worked, may be you could try the same

Upvotes: 0

Niranjan Mangotri
Niranjan Mangotri

Reputation: 31

instead of

[word for word in text_tokens if not word in stopwords.words()]

use

[word for word in text_tokens if not word in all_stopwords]

After stopwords.word('english') the type of the file changes and therefore none of the previous attributes will work

Upvotes: 0

alexis
alexis

Reputation: 50190

The problem is that you redefine stopwords in your code:

from nltk.corpus import stopwords
stopwords=stopwords.words('english')

After the first line, stopwords is a corpus reader with a words() method. After the second line, it is a list. Proceed accordingly.

Actually looking things up in a list is really slow, so you'll get much better performance if you use this:

stopwords = set(stopwords.words('english'))

Upvotes: 4

Related Questions