JudyJiang
JudyJiang

Reputation: 2227

python process multiple string at the same time

I have a list of strings and i want to remove the stop words inside each string. The thing is, the length of the stopwords is much longer than the strings and I don't want to repeat comparing each string with the stopwords list. Is there a way in python that these multiple strings at the same time?

lis = ['aka', 'this is a good day', 'a pretty dog']
stopwords = [] # pretty long list of words
for phrase in lis:
    phrase = phrase.split(' ') # get list of words
    for word in phrase:
        if stopwords.contain(word):
            phrase.replace(word, '')

This is my current method. But these means I have to go through all the phrases in the list. Is there a way that I can process these phrases with only one time compare?

Thanks.

Upvotes: 0

Views: 144

Answers (2)

Raydel Miranda
Raydel Miranda

Reputation: 14360

You could compute the difference between the list formed by each phrase and the stop words.

>>> lis = ['aka', 'this is a good day', 'a pretty dog']
>>> stopwords = ['a', 'dog']

>>> stop = set(stopwords)
>>> result = map(lambda phrase: " ".join(list( set(phrase.split(' ')) - stop)), lis)
>>> print( result )

['aka', 'this is good day', 'pretty']

Upvotes: 1

Cory Kramer
Cory Kramer

Reputation: 117856

This is the same idea, but with a few improvements. Convert your list of stopwords to a set for faster lookups. Then you can iterate over your phrase list in a list comprehension. You can then iterate over the words in the phrase, and keep them if they're not in the stop set, then join the phrase back together.

>>> lis = ['aka', 'this is a good day', 'a pretty dog']
>>> stopwords = ['a', 'dog']
>>> stop = set(stopwords)
>>> [' '.join(j for j in i.split(' ') if j not in stop) for i in lis]
['aka', 'this is good day', 'pretty']

Upvotes: 3

Related Questions