mdawg
mdawg

Reputation: 49

Get quantile of column only if value of another column satisfies condition

I have a dataframe with multiple columns. I want to a "double sort" where within the lowest 50%ile of column A, I extract the lowest 50%ile of column B.

A      B
3     1.0
5     2.0
7     0.5
9     2.1

In this example, the 50th percentile of A would give me the first two rows. Then, the 50th percentile of B of those two would be 1.5. Thus I should return something like column C:

A      B     C
3     1.0    True
5     2.0    False
7     0.5    False
9     2.1    False

In this manner, it is important that the third row does not become true.

Any help is much appreciated!

Upvotes: 1

Views: 391

Answers (1)

Allen Qin
Allen Qin

Reputation: 19957

Is this what you are after?

(
    df.assign(C=df.A.lt(df.A.quantile(0.5)))
    .assign(C=lambda x: x.C & x.B.lt(x.loc[x.C].B.quantile(0.5)))
)

    A   B   C
0   3   1.1 False
1   5   0.9 True
2   7   2.0 False
3   9   2.1 False

The first assign creates a flag to indicate whether A is below 50% quantile.

The second assign does 2 things:

  1. Check if B is below the 50% quantile of the subset of B filtered by first condition
  2. Do a logical AND of the flag(C) and the results of step 1 above and update column C.

Upvotes: 2

Related Questions