Reputation: 49
I have a dataframe with multiple columns. I want to a "double sort" where within the lowest 50%ile of column A, I extract the lowest 50%ile of column B.
A B
3 1.0
5 2.0
7 0.5
9 2.1
In this example, the 50th percentile of A would give me the first two rows. Then, the 50th percentile of B of those two would be 1.5. Thus I should return something like column C:
A B C
3 1.0 True
5 2.0 False
7 0.5 False
9 2.1 False
In this manner, it is important that the third row does not become true.
Any help is much appreciated!
Upvotes: 1
Views: 391
Reputation: 19957
Is this what you are after?
(
df.assign(C=df.A.lt(df.A.quantile(0.5)))
.assign(C=lambda x: x.C & x.B.lt(x.loc[x.C].B.quantile(0.5)))
)
A B C
0 3 1.1 False
1 5 0.9 True
2 7 2.0 False
3 9 2.1 False
The first assign creates a flag to indicate whether A is below 50% quantile.
The second assign does 2 things:
Upvotes: 2