metersk
metersk

Reputation: 12539

Binning pandas data by top N percent

I have a pandas series (as part of a larger data frame) like the below:

0        7416
1       10630
2        7086
3        2091
4        3995
5        1304
6         519
7        1262
8        3676
9        2371
10       5346
11        912
12       3653
13       1093
14       2986
15       2951
16      11859

I would like to group rows based on the following quantiles:

Top 0-5%
Top 6-10%
Top 11-25%
Top 26-50%
Top 51-75%
Top 76-100%

First I started by using pd.rank() on the data and then I planned on then using pd.cut() to cut the data into bins, but it does not seem like this accepts top N%, rather it accepts explicit bin edges. Is there an easy way to do this in pandas, or do I need to create a lambda/apply function which calculates which bin each of the ranked items should be placed in.

Upvotes: 7

Views: 4511

Answers (2)

udothemath
udothemath

Reputation: 137

Slightly modified version:

pd.qcut(data, [0, 0.05, 0.1, 0.25, 0.5, 0.75, 1])

Otherwise it gives me NaN if dataset below 0.05 (5%).

Upvotes: 0

crow_t_robot
crow_t_robot

Reputation: 524

Is this what you had in mind?

pd.qcut(data, [0.05, 0.1, 0.25, 0.5, 0.75, 1])

Upvotes: 12

Related Questions