curious_cosmo
curious_cosmo

Reputation: 1214

Removing elements from a list that lack certain strings - python

I have a large list that looks something like this:

entries = ["['stuff']...other stuff", "['stuff']...stuff", "['stuff']...more stuff", ...]

I want to remove all elements of the list that don't contain the words "other" or "things".

I tried this but it isn't removing all of the elements I need it to (only some near the end):

for e in entries:
    if 'other' or 'things' not in e:
        entries.remove(e)
print entries

What am I doing wrong?

Upvotes: 1

Views: 2184

Answers (3)

Solaxun
Solaxun

Reputation: 2792

As others have already pointed out, in your version there are three main problems:

for e in entries:
    if 'other' or 'things' not in e: #or returns first truthy value, and `if other` is always true.  Also, you need and, not or.
        entries.remove(e) #mutating the item you are iterating over is bad
print entries

Here is your version, revised to fix the above problems:

for e in words[:]: #words[:] is a copy of words, solves mutation issue while iterating
    if 'other' not in e and 'things' not in e: #want words that both don't contain 'other' AND dont contain 'things'
        print(e)
        words.remove(e)
print(words)

And here are some alternative ways to do this:

import re

words = ['this doesnt contain chars you want so gone',
         'this contains other so will be included',
         'this is included bc stuff']

answer = list(filter(lambda x: re.search('other|stuff',x),words))
other_way = [sentence for sentence in words if re.search('other|stuff',sentence)]

print(answer)
print(other_way)

Upvotes: 0

Eugene Yarmash
Eugene Yarmash

Reputation: 149776

You shouldn't be removing items from a list while iterating over it. Also, your conditional statement doesn't do what you mean: it checks 'other' for truthiness and only 'things' for containment. To fix it, use and with two separate in checks.

If the list is not very big, you could just use a list comprehension to rebuild it:

entries = [e for e in entries if "other" not in e and "things" not in e]

Otherwise, loop from the end of the list to the beginning and remove items by indexes.

for i in range(len(entries)-1, -1, -1):
    if "other" in entries[i] and "things" in entries[i]:
        del entries[i]

Upvotes: 2

Moinuddin Quadri
Moinuddin Quadri

Reputation: 48067

You may use the list comprehension expression using all(..) to check for the substring as:

>>> [entry for entry in entries if any(something in entry  for something in  ["other", "things"])]

This will return you the new list of words containing either "other" or "things".

Upvotes: 0

Related Questions