smokinjoe
smokinjoe

Reputation: 93

How to rank within a group in Python?

I have the following data frame

A >

  Bucket    C   Count
PL14    XY23081063  706
PL14    XY23326234  15
PL14    XY23081062  1
PL14    XY23143628  1
FZ595   XY23157633  353
FZ595   XY23683174  107
XM274   XY23681818  139
XM274   XY23681819  108

Now I want to insert a new column "Bucket_Rank" which ranks "C" under each "Bucket" based on descending value of "Count"

required output : B >

Bucket  C   Count   Bucket_Rank
PL14    XY23081063  706 1
PL14    XY23326234  15  2
PL14    XY23081062  1   3
PL14    XY23143628  1   4
FZ595   XY23157633  353 1
FZ595   XY23683174  107 2
XM274   XY23681818  139 1
XM274   XY23681819  108 2

I tried the solution given in the following link

Ranking order per group in Pandas

command : B["Bucket_Rank"] = A.groupby("Bucket ")["Count"].rank("dense", ascending=False)

but its giving me the following error ..

TypeError: rank() got multiple values for argument 'axis'

During handling of the above exception, another exception occurred:

ValueError      

Help appreciated...TIA

Upvotes: 1

Views: 4081

Answers (1)

cs95
cs95

Reputation: 402323

Use groupby + argsort:

v = df.groupby('Bucket').Count\
         .transform(lambda x: np.argsort(-x) + 1)
v

0    1
1    2
2    3
3    4
4    1
5    2
6    1
7    2
Name: Count, dtype: int64

df['Bucket_Rank'] = v

If you want to use rank, specify method='dense'. It is better to explicitly specify each keyword argument so as to prevent confusion.

df.groupby("Bucket")["Count"]\
      .rank(method="dense", ascending=False)

0    1.0
1    2.0
2    3.0
3    3.0
4    1.0
5    2.0
6    1.0
7    2.0
Name: Count, dtype: float64

Note that the result you get isn't exactly what you're expecting since equal counts are assigned the same rank. If you can live with that, rank should work just as well.

Upvotes: 1

Related Questions