Reputation: 12539
I have a pandas series (as part of a larger data frame) like the below:
0 7416
1 10630
2 7086
3 2091
4 3995
5 1304
6 519
7 1262
8 3676
9 2371
10 5346
11 912
12 3653
13 1093
14 2986
15 2951
16 11859
I would like to group rows based on the following quantiles:
Top 0-5%
Top 6-10%
Top 11-25%
Top 26-50%
Top 51-75%
Top 76-100%
First I started by using pd.rank()
on the data and then I planned on then using pd.cut()
to cut the data into bins, but it does not seem like this accepts top N%, rather it accepts explicit bin edges. Is there an easy way to do this in pandas, or do I need to create a lambda/apply function which calculates which bin each of the ranked items should be placed in.
Upvotes: 7
Views: 4511
Reputation: 137
Slightly modified version:
pd.qcut(data, [0, 0.05, 0.1, 0.25, 0.5, 0.75, 1])
Otherwise it gives me NaN if dataset below 0.05 (5%).
Upvotes: 0
Reputation: 524
Is this what you had in mind?
pd.qcut(data, [0.05, 0.1, 0.25, 0.5, 0.75, 1])
Upvotes: 12