Reputation: 2117
I have a pandas dataframe and I only want the top 10 count
from each device. I figured an easy way to do this is create a new column called rank
and then anything with a rank greater than 10 i can remove. Here is the data:
p_dt device namestr count
0 2020-04-29 windows m_outcome1 207209
1 2020-04-29 windows m_outcome2 56599
2 2020-04-29 windows m_outcome3 2880
3 2020-04-29 windows m_outcome4 2879
4 2020-04-29 windows m_outcome5 2879
... ... ... ... ...
50204 2020-04-29 web gateway wg_outcome1 2
50205 2020-04-29 web gateway wg_outcome2 2
50206 2020-04-29 web gateway wg_outcome3 1
50207 2020-04-29 web gateway wg_outcome4 1
50208 2020-04-29 web gateway wg_outcome5 1
The issue that I have is if the count
has multiple numbers that are the same per device
then the rank is repeated several times.
df.groupby('deviceproduct', sort=False)['count'].rank(ascending=False)
0 1.0
1 2.0
2 3.0
3 5.0
4 5.0
...
50204 20.5
50205 20.5
50206 23.0
50207 23.0
50208 23.0
When really for the same data I just want:
0 1.0
1 2.0
2 3.0
3 5.0
4 6.0
...
Is there a way to accomplish this?
Upvotes: 0
Views: 1262
Reputation: 34086
You should consider using method=first
in your rank
method.
df.groupby('deviceproduct', sort=False)['count'].rank(ascending=False, method='first')
This should give you unique ranks per group.
Tested for a sample of your dataframe:
In [860]: df['count']
Out[860]:
0 207209
1 56599
2 2880
3 2879
4 2879
Name: count, dtype: int64
In [856]: df.groupby('device', sort=False)['count'].rank(ascending=False, method='first')
Out[856]:
0 1.0
1 2.0
2 3.0
3 4.0 # different ranks for same value
4 5.0 # different ranks for same value
Name: count, dtype: float64
Upvotes: 2