Reputation: 595
I have a Pandas DataFrame that looks like this
| | Category | Value |
|---:|:-----------|--------:|
| 0 | Apple | 0.25 |
| 1 | Apple | 0.12 |
| 2 | Apple | 0.05 |
| 3 | Orange | 0.7 |
| 4 | Pear | 0.3 |
| 5 | Pear | 0.15 |
Now I would like to keep only those rows in this DataFrame that meet a certain percentage. That means, each category should be allowed to keep its top 15% records (the rest should be dropped). How would that work? The values above are not percentages, just random values. The higher the more important the category is.
Upvotes: 1
Views: 456
Reputation: 323396
You can transform
with quantile
:
sub_df=df.loc[df.Value>=df.groupby('Category').Value.transform(pd.Series.quantile,0.15)]
Category Value
0 Apple 0.25
1 Apple 0.12
3 Orange 0.70
4 Pear 0.30
Upvotes: 1