Reputation: 2227
I have a list of strings and i want to remove the stop words inside each string. The thing is, the length of the stopwords is much longer than the strings and I don't want to repeat comparing each string with the stopwords list. Is there a way in python that these multiple strings at the same time?
lis = ['aka', 'this is a good day', 'a pretty dog']
stopwords = [] # pretty long list of words
for phrase in lis:
phrase = phrase.split(' ') # get list of words
for word in phrase:
if stopwords.contain(word):
phrase.replace(word, '')
This is my current method. But these means I have to go through all the phrases in the list. Is there a way that I can process these phrases with only one time compare?
Thanks.
Upvotes: 0
Views: 144
Reputation: 14360
You could compute the difference between the list formed by each phrase and the stop words.
>>> lis = ['aka', 'this is a good day', 'a pretty dog']
>>> stopwords = ['a', 'dog']
>>> stop = set(stopwords)
>>> result = map(lambda phrase: " ".join(list( set(phrase.split(' ')) - stop)), lis)
>>> print( result )
['aka', 'this is good day', 'pretty']
Upvotes: 1
Reputation: 117856
This is the same idea, but with a few improvements. Convert your list
of stopwords to a set
for faster lookups. Then you can iterate over your phrase list in a list comprehension. You can then iterate over the words in the phrase, and keep them if they're not in the stop set, then join
the phrase back together.
>>> lis = ['aka', 'this is a good day', 'a pretty dog']
>>> stopwords = ['a', 'dog']
>>> stop = set(stopwords)
>>> [' '.join(j for j in i.split(' ') if j not in stop) for i in lis]
['aka', 'this is good day', 'pretty']
Upvotes: 3