Reputation: 93
I have the following data frame
A >
Bucket C Count
PL14 XY23081063 706
PL14 XY23326234 15
PL14 XY23081062 1
PL14 XY23143628 1
FZ595 XY23157633 353
FZ595 XY23683174 107
XM274 XY23681818 139
XM274 XY23681819 108
Now I want to insert a new column "Bucket_Rank" which ranks "C" under each "Bucket" based on descending value of "Count"
required output : B >
Bucket C Count Bucket_Rank
PL14 XY23081063 706 1
PL14 XY23326234 15 2
PL14 XY23081062 1 3
PL14 XY23143628 1 4
FZ595 XY23157633 353 1
FZ595 XY23683174 107 2
XM274 XY23681818 139 1
XM274 XY23681819 108 2
I tried the solution given in the following link
Ranking order per group in Pandas
command : B["Bucket_Rank"] = A.groupby("Bucket ")["Count"].rank("dense", ascending=False)
but its giving me the following error ..
TypeError: rank() got multiple values for argument 'axis'
During handling of the above exception, another exception occurred:
ValueError
Help appreciated...TIA
Upvotes: 1
Views: 4081
Reputation: 402323
Use groupby
+ argsort
:
v = df.groupby('Bucket').Count\
.transform(lambda x: np.argsort(-x) + 1)
v
0 1
1 2
2 3
3 4
4 1
5 2
6 1
7 2
Name: Count, dtype: int64
df['Bucket_Rank'] = v
If you want to use rank
, specify method='dense'
. It is better to explicitly specify each keyword argument so as to prevent confusion.
df.groupby("Bucket")["Count"]\
.rank(method="dense", ascending=False)
0 1.0
1 2.0
2 3.0
3 3.0
4 1.0
5 2.0
6 1.0
7 2.0
Name: Count, dtype: float64
Note that the result you get isn't exactly what you're expecting since equal counts are assigned the same rank. If you can live with that, rank
should work just as well.
Upvotes: 1