Reputation: 1053
I have a dataset like this:
id type score
a1 ball 15
a2 ball 12
a1 pencil 10
a3 ball 8
a2 pencil 6
I want to find out the rank for each type for each id. As I later would translate the rank into percentiles, I prefer using rank
.
the output should be something like this:
id type score rank
a1 ball 15 1
a2 ball 12 2
a1 pencil 10 1
a3 ball 8 3
a2 pencil 6 2
So far, what I did, was getting unique set of type
and iterating over it with this:
test_data['percentile_from_all'] = 0
for i in unique_type_list:
loc_i = test_data['type']==i
percentiles = test_data.loc[loc_i,['score']].rank(pct = True)*100
test_data.loc[loc_i,'percentile_from_all'] = percentiles.values
This approach works well for small datasets, but for even 10k iterations, it becomes too slow. Is there a way to do it simultaneously like with apply
or so?
Thank you!
Upvotes: 1
Views: 90
Reputation: 323236
Check with groupby
df['rnk'] = df.groupby('type').score.rank(ascending=False)
Out[67]:
0 1.0
1 2.0
2 1.0
3 3.0
4 2.0
Name: score, dtype: float64
Upvotes: 2