EchoCache
EchoCache

Reputation: 595

How to filter Pandas DataFrame for top 15% per rows?

I have a Pandas DataFrame that looks like this

|    | Category   |   Value |
|---:|:-----------|--------:|
|  0 | Apple      |    0.25 |
|  1 | Apple      |    0.12 |
|  2 | Apple      |    0.05 |
|  3 | Orange     |    0.7  |
|  4 | Pear       |    0.3  |
|  5 | Pear       |    0.15 |

Now I would like to keep only those rows in this DataFrame that meet a certain percentage. That means, each category should be allowed to keep its top 15% records (the rest should be dropped). How would that work? The values above are not percentages, just random values. The higher the more important the category is.

Upvotes: 1

Views: 456

Answers (1)

BENY
BENY

Reputation: 323396

You can transform with quantile:

sub_df=df.loc[df.Value>=df.groupby('Category').Value.transform(pd.Series.quantile,0.15)]
  Category  Value
0    Apple   0.25
1    Apple   0.12
3   Orange   0.70
4     Pear   0.30

Upvotes: 1

Related Questions