Reputation: 373
I would like to get a subset of a pandas dataframe with boolean indexing.
The condition I want to test is like (df[var_0] == value_0) & ... & (df[var_n] == value_n) where the number n of variables involved can change. As a result I am not able to write :
df = df[(df[var_0] == value_0) & ... & (df[var_n] == value_n)]
I could do something like :
for k in range(0,n+1) :
df = df[df[var_k] == value_k]
(with some try catch to make sure it works if the dataframe goes empty), but that does not seems very efficient. Has anyone an idea on how to write that in a clean pandas formulation ?
Upvotes: 0
Views: 1320
Reputation: 28946
The isin
method should work for you here.
In [7]: df
Out[7]:
a b c d e
0 6 3 1 9 6
1 8 9 5 7 2
2 6 4 7 4 3
3 4 8 0 0 5
4 4 4 2 3 4
5 2 5 9 0 9
6 4 8 2 9 1
7 3 0 8 9 7
8 0 5 9 9 6
9 0 7 8 4 8
[10 rows x 5 columns]
In [8]: vals = {'a': [3], 'b': [0], 'c': [8], 'd': [9], 'e': [7]}
In [9]: df.isin(vals)
Out[9]:
a b c d e
0 False False False True False
1 False False False False False
2 False False False False False
3 False False False False False
4 False False False False False
5 False False False False False
6 False False False True False
7 True True True True True
8 False False False True False
9 False False True False False
[10 rows x 5 columns]
In [10]: df[df.isin(vals).all(1)]
Out[10]:
a b c d e
7 3 0 8 9 7
[1 rows x 5 columns]
The values in the vals
dict need to be a collection, so I put them into length 1 lists. It's possibly that query
can do this too.
Upvotes: 3