Reputation: 317
df=
1
5
34
5
67
8
98
I need a new column with the percentile score for each element with respect to the column. The final answer should look like this. I want the output of the stats.percentileofscore() function to be inputted into the pcntle_rank column. I thought about using apply somehow but how do I pass the required function parameters to percentileofscore?
df =
value pcntle_rank
1 stats.percentileofscore(df['value'], df['value'][1])
5 stats.percentileofscore(df['value'], df['value'][2])
34 stats.percentileofscore(df['value'], df['value'][3])
5 stats.percentileofscore(df['value'], df['value'][4])
67 stats.percentileofscore(df['value'], df['value'][5])
8 stats.percentileofscore(df['value'], df['value'][6])
98 stats.percentileofscore(df['value'], df['value'][7])
This is my attempt. I would like to do this without a loop. The real data has 50 columns and 4000 rows. I will need to do this for every column and row.
for i in range(df.shape[0]):
df['pcntle_rank'][i] = stats.percentileofscore(df.loc[:,['value']],df['value'][i])
My loop gives results but I want to do it without a for loop.
Upvotes: 3
Views: 950
Reputation: 59569
Series.rank
with pct=True
, this is the equivalent of stats.percentileofscore with the default kind='rank'
df[0].rank(pct=True)*100
#0 14.285714
#1 35.714286
#2 71.428571
#3 35.714286
#4 85.714286
#5 57.142857
#6 100.000000
#Name: 0, dtype: float64
from scipy import stats
for idx, val in df[0].iteritems():
print(f'{val}: {stats.percentileofscore(df[0], score=val)}')
#1 : 14.285714285714286
#5 : 35.714285714285715
#34 : 71.42857142857143
#5 : 35.714285714285715
#67 : 85.71428571428571
#8 : 57.142857142857146
#98 : 100.0
Upvotes: 4