Gabriel
Gabriel

Reputation: 42459

Filtering a list based on a list of booleans

I have a list of values which I need to filter given the values in a list of booleans:

list_a = [1, 2, 4, 6]
filter = [True, False, True, False]

I generate a new filtered list with the following line:

filtered_list = [i for indx,i in enumerate(list_a) if filter[indx] == True]

which results in:

print filtered_list
[1,4]

The line works but looks (to me) a bit overkill and I was wondering if there was a simpler way to achieve the same.


Advices

Summary of two good advices given in the answers below:

1- Don't name a list filter like I did because it is a built-in function.

2- Don't compare things to True like I did with if filter[idx]==True.. since it's unnecessary. Just using if filter[idx] is enough.

Upvotes: 216

Views: 193618

Answers (7)

Mauro
Mauro

Reputation: 479

May be not so elegant, but I think this solution has simplier syntax. I renamed filter to filter_ to avoid conflict with the built in function:

list_a = [1, 2, 4, 6]
filter_ = [True, False, True, False]

Here the solution:

index = [i for i in range(len(filter_)) if filter_[i]]
list_a_filtered = [list_a[i] for i in index]

or in one line:

list_a_filtered = [list_a[i] for i in [j for j in range(len(filter_)) if filter_[j]]]

Upvotes: 0

With python 3 you can use list_a[filter] to get True values. To get False values use list_a[~filter]

Upvotes: -4

Bas Swinckels
Bas Swinckels

Reputation: 18488

Like so:

filtered_list = [i for (i, v) in zip(list_a, filter) if v]

Using zip is the pythonic way to iterate over multiple sequences in parallel, without needing any indexing. This assumes both sequences have the same length (zip stops after the shortest runs out). Using itertools for such a simple case is a bit overkill ...

One thing you do in your example you should really stop doing is comparing things to True, this is usually not necessary. Instead of if filter[idx]==True: ..., you can simply write if filter[idx]: ....

Upvotes: 75

Daniel Braun
Daniel Braun

Reputation: 2722

filtered_list = [list_a[i] for i in range(len(list_a)) if filter[i]]

Upvotes: 7

Alex Szatmary
Alex Szatmary

Reputation: 3571

To do this using numpy, ie, if you have an array, a, instead of list_a:

a = np.array([1, 2, 4, 6])
my_filter = np.array([True, False, True, False], dtype=bool)
a[my_filter]
> array([1, 4])

Upvotes: 21

Hammer
Hammer

Reputation: 10329

With numpy:

In [128]: list_a = np.array([1, 2, 4, 6])
In [129]: filter = np.array([True, False, True, False])
In [130]: list_a[filter]

Out[130]: array([1, 4])

or see Alex Szatmary's answer if list_a can be a numpy array but not filter

Numpy usually gives you a big speed boost as well

In [133]: list_a = [1, 2, 4, 6]*10000
In [134]: fil = [True, False, True, False]*10000
In [135]: list_a_np = np.array(list_a)
In [136]: fil_np = np.array(fil)

In [139]: %timeit list(itertools.compress(list_a, fil))
1000 loops, best of 3: 625 us per loop

In [140]: %timeit list_a_np[fil_np]
10000 loops, best of 3: 173 us per loop

Upvotes: 52

Ashwini Chaudhary
Ashwini Chaudhary

Reputation: 251176

You're looking for itertools.compress:

>>> from itertools import compress
>>> list_a = [1, 2, 4, 6]
>>> fil = [True, False, True, False]
>>> list(compress(list_a, fil))
[1, 4]

Timing comparisons(py3.x):

>>> list_a = [1, 2, 4, 6]
>>> fil = [True, False, True, False]
>>> %timeit list(compress(list_a, fil))
100000 loops, best of 3: 2.58 us per loop
>>> %timeit [i for (i, v) in zip(list_a, fil) if v]  #winner
100000 loops, best of 3: 1.98 us per loop

>>> list_a = [1, 2, 4, 6]*100
>>> fil = [True, False, True, False]*100
>>> %timeit list(compress(list_a, fil))              #winner
10000 loops, best of 3: 24.3 us per loop
>>> %timeit [i for (i, v) in zip(list_a, fil) if v]
10000 loops, best of 3: 82 us per loop

>>> list_a = [1, 2, 4, 6]*10000
>>> fil = [True, False, True, False]*10000
>>> %timeit list(compress(list_a, fil))              #winner
1000 loops, best of 3: 1.66 ms per loop
>>> %timeit [i for (i, v) in zip(list_a, fil) if v] 
100 loops, best of 3: 7.65 ms per loop

Don't use filter as a variable name, it is a built-in function.

Upvotes: 287

Related Questions