Maksim Khaitovich
Maksim Khaitovich

Reputation: 4792

Pandas - cut records with the custom percentiles

I have a pandas dataframe with a column of continous variables. I need to convert them into 3 bins, such that first bin encompases values <20 percentile, second between 20 and 80th percentile and last is >80th percentile.

I am trying to achieve it by first getting the bin boundaries for such percentiles and then using pandas cut function. The issue is that I get an odd results, of getting only the middle bin encoded. Please see below:

test = [x for x in range(0,100)]
a = pd.DataFrame(test)

np.percentile(a, [20, 80])
Out[52]: array([ 19.8,  79.2])

pd.cut(a[0], np.percentile(a[0], [20, 80]))

...
15             NaN
16             NaN
17             NaN
18             NaN
19             NaN
20    (19.8, 79.2]
21    (19.8, 79.2]
22    (19.8, 79.2]
...
78    (19.8, 79.2]
79    (19.8, 79.2]
80             NaN

Why is that so? I though pandas cut requires you to supply boundaries of bins you want to get. Supplying 2 boundaries I supposed to get 3 bins, but seems like it doesn't work this way.

Upvotes: 3

Views: 6737

Answers (1)

BENY
BENY

Reputation: 323276

If you need 3 bins , then you need 4 break..

test = [x for x in range(0,100)]
a = pd.DataFrame(test)
np.percentile(a, [0,20, 80,100])
Out[527]: array([ 0. , 19.8, 79.2, 99. ])
pd.cut(a[0], np.percentile(a[0], [0,20, 80,100]))

Also, in pandas we have qcut , which means you do not need get the bin from numpy

pd.qcut(a[0],[0,0.2,0.8,1])

Upvotes: 7

Related Questions