Reputation: 25
Please read this question carefully, considering all aspects, before marking it as duplicate. I also had a stack account but seem to have forgotten on which email, thus a brand new one!
I have a df
with multiple columns and millions (literally) rows. Here's an example so I can be more specific about my question without revealing sensitive data:
a b c d ... AAA BBB
--- --- --- --- ... hf4r 192
--- --- --- --- ... h23g 103
--- --- --- --- ... 9fyd 7
--- --- --- --- ... hf4r 121
--- --- --- --- ... hf4r 32
--- --- --- --- ... h23g 12
............... ............
I want to only keep those rows whose BBB
value is larger than or equal to the 80th percentile of BBB
s for their specific AAA
; for all AAA
.
I work with pandas. I tried in-line for
s and .query
s and just regular calls, but I must be doing something wrong because each time my compiler doesn't like one thing or the other. Will appreciate any insights.
Upvotes: 0
Views: 1153
Reputation: 3306
I would have done this, even though I upvoted Wen's answer because I didn't know that transform could do this. :)
df.loc[df.BBB >= df.describe(percentiles = [.8]).iloc[5]['BBB']]
Full code :
import pandas as pd
data = {
'AAA' : ['hf4r', 'hf4r', 'hf4r', 'hf4r', 'hf4r', 'hf4r'],
'BBB' : [1, 2, 3, 4, 5, 6]
}
df = pd.DataFrame(data)
selected_values = df.loc[df.BBB >= df.describe(percentiles = [.8]).iloc[5]['BBB']]
print(df)
print(selected_values)
# AAA BBB
# 0 hf4r 1
# 1 hf4r 2
# 2 hf4r 3
# 3 hf4r 4
# 4 hf4r 5
# 5 hf4r 6
# AAA BBB
# 4 hf4r 5
# 5 hf4r 6
Upvotes: 0
Reputation: 323226
Seems like you need transform
with quantile
df[df.BBB>=df.groupby('AAA').BBB.transform('quantile',0.8)]
Upvotes: 2