not link
not link

Reputation: 870

multiple multi-column selections in pandas

I am trying do multiple multi-column selections from a pandas dataframe and can't seem to find an elegant way to do accomplish it.

Set up:

import pandas as pd
import numpy as np

df = pd.DataFrame(np.random.randint(0,100,size=(100, 8)),columns=list('ABCDEFGH'))

Let's say I want to find the rows of the dataframe for which columns 'A' and 'B' are greater than 50, I would do something like this:

df[['A', 'B']].where(lambda x: x > 50).dropna()

Now let's say I also want to select on columns 'E' and 'F' where they are less than 25.

How do I accomplish this?

Note: In the actual dataset ['A', 'B'] can be a set of 10+ columns that are non-sequential, and ['E', 'F'] can also be a set of 10+ columns that are non-sequential. I have lists of column names for each set of criteria. So I want to avoid ('A' > 50) & ('B' > 50) as I would then have to string together 30+ boolean criteria.

Upvotes: 1

Views: 106

Answers (1)

Omar
Omar

Reputation: 329

import pandas as pd
import numpy as np
df = pd.DataFrame(np.random.randint(0,100,size=(100, 8)),columns=list('ABCDEFGH'))

Define the columns you want to constrain:

columns1 = ['A', 'B', 'C', 'E']
columns2 = ['D', 'F']

Define and apply constraints

constraint1 = (df[columns1] > 50).all(axis =1)
constraint2 = (df[columns2] < 50).any(axis =1)
print df[constraint1 & constraint2]

Note:

.any(axis=1) acts as OR operator when applying the constraint to columns.

.all(axis=1) acts as AND operator when applying the constraint to columns.

Upvotes: 1

Related Questions