stopword removal in python list

Question

I have a list of sentences as follows

pylist=['This is an apple', 'This is an orange', 'The pineapple is yellow','A grape is red']

If I define a stopwords list such as

stopwords=['This', 'is', 'an', 'The']

Is there a way for me to apply this to the entire list such that my output is

pylist=['apple','orange','pineapple is yellow','A grape is red']

PS: I tried to use apply with a function defined to remove stopwords like [removewords(x) for x in pylist] but wasn't successful (plus not sure if this is the most efficient way). Thanks!

yatu · Accepted Answer

You could use a nested list comprehension, and define stopwords as a set to reduce the lookup complexity to O(1):

pylist=['This is an apple', 'This is an orange', 'The pineapple is yellow',
        'A grape is red']
stopwords = set(['This', 'is', 'an', 'The'])

[' '.join([w for w in s.split() if w not in stopwords]) for s in pylist]
# ['apple', 'orange', 'pineapple yellow', 'A grape red']

Note however, that for a more general approach you can use the stopwords from nltk's english corpus:

from nltk.corpus import stopwords
stop_w = set(stopwords.words('english'))

[' '.join([w for w in s.split() if w.lower() not in stop_w]) for s in pylist]
# ['apple', 'orange', 'pineapple yellow', 'grape red']

stopword removal in python list

Answers (2)

Related Questions