user12907213
user12907213

Reputation:

Clean list from stopwords

This variable:

sent=[('include', 'details', 'about', 'your performance'),
('show', 'the', 'results,', 'which', 'you\'ve', 'got')]

needs to be clean of stopwords. I tried with

output = [w for w in sent if not w in stop_words]

but it has not worked. What is it wrong?

Upvotes: 3

Views: 404

Answers (3)

Sy Ker
Sy Ker

Reputation: 2180

from nltk.corpus import stopwords

stop_words = {w.lower() for w in stopwords.words('english')}

sent = [('include', 'details', 'about', 'your', 'performance'),
        ('show', 'the', 'results,', 'which', 'you\'ve', 'got')]

If you want to create a single list of words without the stop words;

>>> no_stop_words = [word for sentence in sent for word in sentence if word not in stop_words]
['include', 'details', 'performance', 'show', 'results,', 'got']

If you want to keep the sentences intact;

>>> sent_no_stop = [[word for word in sentence if word not in stop_words] for sentence in sent]
[['include', 'details', 'performance'], ['show', 'results,', 'got']]

However, most of the time you would work with a list of words (without parentheses);

sent = ['include', 'details', 'about', 'your performance','show', 'the', 'results,', 'which', 'you\'ve', 'got']

>>> no_stopwords = [word for word in sent if word not in stop_words]
['include', 'details', 'performance', 'show', 'results,', 'got']

Upvotes: 8

Red
Red

Reputation: 27547

It's the round brackets that are getting in the way of the iteration. If you can remove them:

sent=['include', 'details', 'about', 'your performance','show', 'the', 'results,', 'which', 'you\'ve', 'got']
output = [w for w in sent if not w in stopwords]

If not, then you can do this:

sent=[('include', 'details', 'about', 'your performance'),('show', 'the', 'results,', 'which', 'you\'ve', 'got')]
output = [i for s in [[w for w in l if w not in stopwords] for l in sent] for i in s]

Upvotes: 6

vaponteblizzard
vaponteblizzard

Reputation: 186

Are you missing a a quote in your actual code? Make sure to close all the strings and escape your apostrophes with a backslash if you're using the same type of quote. I would also make every word separate, like this:

sent=[('include', 'details', 'about', 'your', 'performance'), ('show', 'the', 'results,', 'which', 'you\'ve', 'got')]

Upvotes: 0

Related Questions