sectechguy
sectechguy

Reputation: 2117

Pandas how to get distinct rank when the you dont have unique counts

I have a pandas dataframe and I only want the top 10 count from each device. I figured an easy way to do this is create a new column called rank and then anything with a rank greater than 10 i can remove. Here is the data:

        p_dt         device         namestr     count
0       2020-04-29   windows        m_outcome1  207209
1       2020-04-29   windows        m_outcome2  56599
2       2020-04-29   windows        m_outcome3  2880
3       2020-04-29   windows        m_outcome4  2879
4       2020-04-29   windows        m_outcome5  2879
... ... ... ... ...
50204   2020-04-29   web gateway    wg_outcome1 2
50205   2020-04-29   web gateway    wg_outcome2 2
50206   2020-04-29   web gateway    wg_outcome3 1
50207   2020-04-29   web gateway    wg_outcome4 1
50208   2020-04-29   web gateway    wg_outcome5 1

The issue that I have is if the count has multiple numbers that are the same per device then the rank is repeated several times.

df.groupby('deviceproduct', sort=False)['count'].rank(ascending=False)

0         1.0
1         2.0
2         3.0
3         5.0
4         5.0
         ... 
50204    20.5
50205    20.5
50206    23.0
50207    23.0
50208    23.0

When really for the same data I just want:

0         1.0
1         2.0
2         3.0
3         5.0
4         6.0
         ... 

Is there a way to accomplish this?

Upvotes: 0

Views: 1262

Answers (1)

Mayank Porwal
Mayank Porwal

Reputation: 34086

You should consider using method=first in your rank method.

df.groupby('deviceproduct', sort=False)['count'].rank(ascending=False, method='first')

This should give you unique ranks per group.

Tested for a sample of your dataframe:

In [860]: df['count']                                                                                                                                                                                       
Out[860]: 
0    207209
1     56599
2      2880
3      2879
4      2879
Name: count, dtype: int64

In [856]: df.groupby('device', sort=False)['count'].rank(ascending=False, method='first')                                                                                                                   
Out[856]: 
0    1.0
1    2.0
2    3.0
3    4.0 # different ranks for same value
4    5.0 # different ranks for same value
Name: count, dtype: float64

Upvotes: 2

Related Questions