Reputation:
This variable:
sent=[('include', 'details', 'about', 'your performance'),
('show', 'the', 'results,', 'which', 'you\'ve', 'got')]
needs to be clean of stopwords. I tried with
output = [w for w in sent if not w in stop_words]
but it has not worked. What is it wrong?
Upvotes: 3
Views: 404
Reputation: 2180
from nltk.corpus import stopwords
stop_words = {w.lower() for w in stopwords.words('english')}
sent = [('include', 'details', 'about', 'your', 'performance'),
('show', 'the', 'results,', 'which', 'you\'ve', 'got')]
If you want to create a single list of words without the stop words;
>>> no_stop_words = [word for sentence in sent for word in sentence if word not in stop_words]
['include', 'details', 'performance', 'show', 'results,', 'got']
If you want to keep the sentences intact;
>>> sent_no_stop = [[word for word in sentence if word not in stop_words] for sentence in sent]
[['include', 'details', 'performance'], ['show', 'results,', 'got']]
However, most of the time you would work with a list of words (without parentheses);
sent = ['include', 'details', 'about', 'your performance','show', 'the', 'results,', 'which', 'you\'ve', 'got']
>>> no_stopwords = [word for word in sent if word not in stop_words]
['include', 'details', 'performance', 'show', 'results,', 'got']
Upvotes: 8
Reputation: 27547
It's the round brackets that are getting in the way of the iteration. If you can remove them:
sent=['include', 'details', 'about', 'your performance','show', 'the', 'results,', 'which', 'you\'ve', 'got']
output = [w for w in sent if not w in stopwords]
If not, then you can do this:
sent=[('include', 'details', 'about', 'your performance'),('show', 'the', 'results,', 'which', 'you\'ve', 'got')]
output = [i for s in [[w for w in l if w not in stopwords] for l in sent] for i in s]
Upvotes: 6
Reputation: 186
Are you missing a a quote in your actual code? Make sure to close all the strings and escape your apostrophes with a backslash if you're using the same type of quote. I would also make every word separate, like this:
sent=[('include', 'details', 'about', 'your', 'performance'), ('show', 'the', 'results,', 'which', 'you\'ve', 'got')]
Upvotes: 0