creating a boolean indexing in for loop in pandas

Question

I would like to get a subset of a pandas dataframe with boolean indexing.

The condition I want to test is like (df[var_0] == value_0) & ... & (df[var_n] == value_n) where the number n of variables involved can change. As a result I am not able to write :

df = df[(df[var_0] == value_0) & ... & (df[var_n] == value_n)]

I could do something like :

for k in range(0,n+1) :
    df = df[df[var_k] == value_k]

(with some try catch to make sure it works if the dataframe goes empty), but that does not seems very efficient. Has anyone an idea on how to write that in a clean pandas formulation ?

TomAugspurger · Accepted Answer

The isin method should work for you here.

In [7]: df
Out[7]: 
   a  b  c  d  e
0  6  3  1  9  6
1  8  9  5  7  2
2  6  4  7  4  3
3  4  8  0  0  5
4  4  4  2  3  4
5  2  5  9  0  9
6  4  8  2  9  1
7  3  0  8  9  7
8  0  5  9  9  6
9  0  7  8  4  8

[10 rows x 5 columns]

In [8]: vals = {'a': [3], 'b': [0], 'c': [8], 'd': [9], 'e': [7]}

In [9]: df.isin(vals)
Out[9]: 
       a      b      c      d      e
0  False  False  False   True  False
1  False  False  False  False  False
2  False  False  False  False  False
3  False  False  False  False  False
4  False  False  False  False  False
5  False  False  False  False  False
6  False  False  False   True  False
7   True   True   True   True   True
8  False  False  False   True  False
9  False  False   True  False  False

[10 rows x 5 columns]

In [10]: df[df.isin(vals).all(1)]
Out[10]: 
   a  b  c  d  e
7  3  0  8  9  7

[1 rows x 5 columns]

The values in the vals dict need to be a collection, so I put them into length 1 lists. It's possibly that query can do this too.

creating a boolean indexing in for loop in pandas

Answers (1)

Related Questions