J Jones
J Jones

Reputation: 3190

Selecting rows from a pandas DataFrame with an arbitrary number of conditions

I have a pandas DataFrame, and I want to select rows for which a certain set of values are True.

df = DataFrame([[True, True, True], [True, False, True], [True, True, False]], index=['a', 'b', 'c'], columns=['val1', 'val2', 'other'])#, columns=['val1', 'other'])

# example DataFrame:

    val1    val2    other
a   True    True    True
b   True    False   True
c   True    True    False

I want the rows for which any column whose name contains the string 'val' has a value of True. In the case of this example DataFrame, this means that I care about the values in columns 'val1' and 'val2'. This selection is easy if I know the specific names of those two columns:

cond = df['val1'] & df['val2']
df[cond]

# gives me:

    val1    val2    other
a   True    True    True
c   True    True    False

So far so good. However, in my actual use case I don't know how many 'val' columns there will be, or what their exact names are.

I know how to get all of the 'val' columns:

df.columns[df.columns.str.contains('val')]

# gives me:

Index([u'val1', u'val2'], dtype='object')

But how do I put these together? Given an arbitrary-length list of column names, how can I use boolean indexing to get rows for which those columns have True values?

Upvotes: 0

Views: 552

Answers (2)

MaxU - stand with Ukraine
MaxU - stand with Ukraine

Reputation: 210842

you can do it using filter() and all() functions:

In [194]: df[df.filter(like='val').all(axis=1)]
Out[194]:
   val1  val2  other
a  True  True   True
c  True  True  False

Upvotes: 1

Stefan
Stefan

Reputation: 42875

Use .all(axis=1) (see docs):

df[df.loc[:, df.columns.str.contains('val')].all(axis=1)]

Sample data for illustration:

df = pd.DataFrame(data=np.random.choice([True, False], (5, 6)), columns=['val_{}'.format(i) for i in range(5)] + ['other'])

   val_0  val_1  val_2  val_3  val_4  other
0   True   True   True   True   True  False
1  False   True   True   True  False  False
2  False  False   True  False   True   True
3   True  False  False  False   True   True
4  False   True   True   True  False  False

df[df.loc[:, df.columns.str.contains('val')].all(1)]

  val_0 val_1 val_2 val_3 val_4  other
0  True  True  True  True  True  False

Upvotes: 1

Related Questions