A. K.
A. K.

Reputation: 25

How to keep values over a percentile based on a condition on another column in pandas dataframe

Please read this question carefully, considering all aspects, before marking it as duplicate. I also had a stack account but seem to have forgotten on which email, thus a brand new one!

I have a df with multiple columns and millions (literally) rows. Here's an example so I can be more specific about my question without revealing sensitive data:

a     b     c     d   ...   AAA      BBB   

---   ---  ---   ---  ...   hf4r     192
---   ---  ---   ---  ...   h23g     103
---   ---  ---   ---  ...   9fyd      7
---   ---  ---   ---  ...   hf4r     121
---   ---  ---   ---  ...   hf4r      32
---   ---  ---   ---  ...   h23g      12
...............             ............

I want to only keep those rows whose BBB value is larger than or equal to the 80th percentile of BBBs for their specific AAA; for all AAA.

I work with pandas. I tried in-line fors and .querys and just regular calls, but I must be doing something wrong because each time my compiler doesn't like one thing or the other. Will appreciate any insights.

Upvotes: 0

Views: 1153

Answers (2)

IMCoins
IMCoins

Reputation: 3306

I would have done this, even though I upvoted Wen's answer because I didn't know that transform could do this. :)

df.loc[df.BBB >= df.describe(percentiles = [.8]).iloc[5]['BBB']]

Full code :

import pandas as pd

data = {
    'AAA' : ['hf4r', 'hf4r', 'hf4r', 'hf4r', 'hf4r', 'hf4r'],
    'BBB' : [1, 2, 3, 4, 5, 6]
}

df = pd.DataFrame(data)
selected_values = df.loc[df.BBB >= df.describe(percentiles = [.8]).iloc[5]['BBB']]
print(df)
print(selected_values)

#     AAA  BBB
# 0  hf4r    1
# 1  hf4r    2
# 2  hf4r    3
# 3  hf4r    4
# 4  hf4r    5
# 5  hf4r    6

#     AAA  BBB
# 4  hf4r    5
# 5  hf4r    6

Upvotes: 0

BENY
BENY

Reputation: 323226

Seems like you need transform with quantile

df[df.BBB>=df.groupby('AAA').BBB.transform('quantile',0.8)]

Upvotes: 2

Related Questions