Reputation: 870
I am trying do multiple multi-column selections from a pandas dataframe and can't seem to find an elegant way to do accomplish it.
Set up:
import pandas as pd
import numpy as np
df = pd.DataFrame(np.random.randint(0,100,size=(100, 8)),columns=list('ABCDEFGH'))
Let's say I want to find the rows of the dataframe for which columns 'A' and 'B' are greater than 50, I would do something like this:
df[['A', 'B']].where(lambda x: x > 50).dropna()
Now let's say I also want to select on columns 'E' and 'F' where they are less than 25.
How do I accomplish this?
Note: In the actual dataset ['A', 'B'] can be a set of 10+ columns that are non-sequential, and ['E', 'F'] can also be a set of 10+ columns that are non-sequential. I have lists of column names for each set of criteria. So I want to avoid ('A' > 50) & ('B' > 50)
as I would then have to string together 30+ boolean criteria.
Upvotes: 1
Views: 106
Reputation: 329
import pandas as pd
import numpy as np
df = pd.DataFrame(np.random.randint(0,100,size=(100, 8)),columns=list('ABCDEFGH'))
Define the columns you want to constrain:
columns1 = ['A', 'B', 'C', 'E']
columns2 = ['D', 'F']
Define and apply constraints
constraint1 = (df[columns1] > 50).all(axis =1)
constraint2 = (df[columns2] < 50).any(axis =1)
print df[constraint1 & constraint2]
Note:
.any(axis=1) acts as OR operator when applying the constraint to columns.
.all(axis=1) acts as AND operator when applying the constraint to columns.
Upvotes: 1