Reputation: 3190
I have a pandas DataFrame, and I want to select rows for which a certain set of values are True.
df = DataFrame([[True, True, True], [True, False, True], [True, True, False]], index=['a', 'b', 'c'], columns=['val1', 'val2', 'other'])#, columns=['val1', 'other'])
# example DataFrame:
val1 val2 other
a True True True
b True False True
c True True False
I want the rows for which any column whose name contains the string 'val' has a value of True. In the case of this example DataFrame, this means that I care about the values in columns 'val1' and 'val2'. This selection is easy if I know the specific names of those two columns:
cond = df['val1'] & df['val2']
df[cond]
# gives me:
val1 val2 other
a True True True
c True True False
So far so good. However, in my actual use case I don't know how many 'val' columns there will be, or what their exact names are.
I know how to get all of the 'val' columns:
df.columns[df.columns.str.contains('val')]
# gives me:
Index([u'val1', u'val2'], dtype='object')
But how do I put these together? Given an arbitrary-length list of column names, how can I use boolean indexing to get rows for which those columns have True values?
Upvotes: 0
Views: 552
Reputation: 210842
you can do it using filter() and all() functions:
In [194]: df[df.filter(like='val').all(axis=1)]
Out[194]:
val1 val2 other
a True True True
c True True False
Upvotes: 1
Reputation: 42875
Use .all(axis=1)
(see docs):
df[df.loc[:, df.columns.str.contains('val')].all(axis=1)]
Sample data for illustration:
df = pd.DataFrame(data=np.random.choice([True, False], (5, 6)), columns=['val_{}'.format(i) for i in range(5)] + ['other'])
val_0 val_1 val_2 val_3 val_4 other
0 True True True True True False
1 False True True True False False
2 False False True False True True
3 True False False False True True
4 False True True True False False
df[df.loc[:, df.columns.str.contains('val')].all(1)]
val_0 val_1 val_2 val_3 val_4 other
0 True True True True True False
Upvotes: 1