Reputation: 330
I have a dataframe:
Column1 Column2
a 0.34
b 0.25
c 0.75
d 1.5
e 0.31
f 2.45
g 7.89
How to calculate the top 25% of data with highest value in Column2.
For example, with 7 rows, top 25% would be 1.75 ~ 2
Output:
Column1 Column2
g 7.89
f 2.45
Upvotes: 0
Views: 2919
Reputation: 42946
As you clarified in the comments ("25% highest values"), this is basically values higher than the 75th quantile
. So we can use Series.quantile
:
q75 = df['Column2'].quantile(q=0.75)
df[df['Column2'].ge(q75)]
Or shorter with DataFrame.query
:
df.query('Column2 >= Column2.quantile(q=0.75)')
Column1 Column2
5 f 2.45
6 g 7.89
Upvotes: 3
Reputation: 323396
We do qcut
df[pd.qcut(df.Column2,q=4,labels=[1,2,3,4])==4]
Column1 Column2
5 f 2.45
6 g 7.89
pd.qcut(df.Column2,q=4)
0 (0.325, 0.75]
1 (0.249, 0.325]
2 (0.325, 0.75]
3 (0.75, 1.975]
4 (0.249, 0.325]
5 (1.975, 7.89]
6 (1.975, 7.89]
Name: Column2, dtype: category
Categories (4, interval[float64]): [(0.249, 0.325] < (0.325, 0.75] < (0.75, 1.975] < (1.975, 7.89]]
Upvotes: 5