melik
melik

Reputation: 1332

python pandas find percentile for a group in column

I would like to find percentile of each column and add to df data frame and also label

if the value of the column is

top 20 percent (value>80th percentile) then 'strong'

below 20 percent (value>80th percentile) then 'weak'

else average

Below is my dataframe

  df=pd.DataFrame({'month':['1','1','1','1','1','2','2','2','2','2','2','2'],'X1': 
  [30,42,25,32,12,10,4,6,5,10,24,21],'X2':[10,76,100,23,65,94,67,24,67,54,87,81],'X3': 
  [23,78,95,52,60,76,68,92,34,76,34,12]})
  df

Below what I tried

  df['X1_percentile'] = df.X1.rank(pct = True)
  df['X1_segment'] = np.where(df['X1_percentile']>0.8, 'Strong',np.where(df['X1_percentile'] 
  <0.20,'Weak', 'Average')) 

But I would like to do this for each month and for each column. And if possible this could be automted by a function for any col numbers and also type colname+"_per" and colname+"_segment" for each column ? Thanks

Upvotes: 1

Views: 2768

Answers (1)

Shubham Sharma
Shubham Sharma

Reputation: 71689

We can use groupby + rank with optional parameter pct=True to calculate the ranking expressed as percentile rank, then using np.select bin/categorize the percentile values into discrete lables.

p = df.groupby('month').rank(pct=True)

df[p.columns + '_per'] = p
df[p.columns + '_seg'] = np.select([p.gt(.8), p.lt(.2)], ['strong', 'weak'], 'average')

   month  X1   X2  X3    X1_per    X2_per    X3_per   X1_seg   X2_seg   X3_seg
0      1  30   10  23  0.600000  0.200000  0.200000  average  average  average
1      1  42   76  78  1.000000  0.800000  0.800000   strong  average  average
2      1  25  100  95  0.400000  1.000000  1.000000  average   strong   strong
3      1  32   23  52  0.800000  0.400000  0.400000  average  average  average
4      1  12   65  60  0.200000  0.600000  0.600000  average  average  average
5      2  10   94  76  0.642857  1.000000  0.785714  average   strong  average
6      2   4   67  68  0.142857  0.500000  0.571429     weak  average  average
7      2   6   24  92  0.428571  0.142857  1.000000  average     weak   strong
8      2   5   67  34  0.285714  0.500000  0.357143  average  average  average
9      2  10   54  76  0.642857  0.285714  0.785714  average  average  average
10     2  24   87  34  1.000000  0.857143  0.357143   strong   strong  average
11     2  21   81  12  0.857143  0.714286  0.142857   strong  average     weak

Upvotes: 3

Related Questions