Ilja
Ilja

Reputation: 1053

Pandas - optimize percentile calculation

I have a dataset like this:

id     type     score
a1     ball       15
a2     ball       12
a1     pencil     10
a3     ball       8
a2     pencil     6

I want to find out the rank for each type for each id. As I later would translate the rank into percentiles, I prefer using rank.

the output should be something like this:

id     type     score rank
a1     ball       15   1
a2     ball       12   2
a1     pencil     10   1
a3     ball       8    3
a2     pencil     6    2

So far, what I did, was getting unique set of type and iterating over it with this:

test_data['percentile_from_all'] = 0
for i in unique_type_list:
    loc_i = test_data['type']==i
    percentiles = test_data.loc[loc_i,['score']].rank(pct = True)*100
    test_data.loc[loc_i,'percentile_from_all'] = percentiles.values

This approach works well for small datasets, but for even 10k iterations, it becomes too slow. Is there a way to do it simultaneously like with apply or so?

Thank you!

Upvotes: 1

Views: 90

Answers (1)

BENY
BENY

Reputation: 323236

Check with groupby

df['rnk'] = df.groupby('type').score.rank(ascending=False)
Out[67]: 
0    1.0
1    2.0
2    1.0
3    3.0
4    2.0
Name: score, dtype: float64

Upvotes: 2

Related Questions