mbs1
mbs1

Reputation: 317

Applying stats.percentileofscore to every row by column

df=

1
5
34
5
67
8
98

I need a new column with the percentile score for each element with respect to the column. The final answer should look like this. I want the output of the stats.percentileofscore() function to be inputted into the pcntle_rank column. I thought about using apply somehow but how do I pass the required function parameters to percentileofscore?

df =

value    pcntle_rank
1        stats.percentileofscore(df['value'], df['value'][1])
5        stats.percentileofscore(df['value'], df['value'][2]) 
34       stats.percentileofscore(df['value'], df['value'][3]) 
5        stats.percentileofscore(df['value'], df['value'][4]) 
67       stats.percentileofscore(df['value'], df['value'][5])
8        stats.percentileofscore(df['value'], df['value'][6])
98       stats.percentileofscore(df['value'], df['value'][7]) 

This is my attempt. I would like to do this without a loop. The real data has 50 columns and 4000 rows. I will need to do this for every column and row.

  for i in range(df.shape[0]):
      df['pcntle_rank'][i] = stats.percentileofscore(df.loc[:,['value']],df['value'][i])

My loop gives results but I want to do it without a for loop.

Upvotes: 3

Views: 950

Answers (1)

ALollz
ALollz

Reputation: 59569

Series.rank

with pct=True, this is the equivalent of stats.percentileofscore with the default kind='rank'

df[0].rank(pct=True)*100
#0     14.285714
#1     35.714286
#2     71.428571
#3     35.714286
#4     85.714286
#5     57.142857
#6    100.000000
#Name: 0, dtype: float64

from scipy import stats

for idx, val in df[0].iteritems():
    print(f'{val}: {stats.percentileofscore(df[0], score=val)}')

#1 : 14.285714285714286
#5 : 35.714285714285715
#34 : 71.42857142857143
#5 : 35.714285714285715
#67 : 85.71428571428571
#8 : 57.142857142857146
#98 : 100.0

Upvotes: 4

Related Questions