Reputation: 225
I have a dataframe with city, name and members. I need to find the top 5 groups (name) in terms of highest member ('members') count per city.
This is what I get when I use:
clust.groupby(['city','name']).agg({'members':sum})
members
city name
Bath AWS Bath User Group 346
Agile Bath & Bristol 957
Bath Crypto Chat 47
Bath JS 142
Bath Machine Learning Meetup 435
Belfast 4th Industrial Revolution Challenge 609
Belfast Adobe Meetup 66
Belfast Azure Meetup 205
Southampton Crypto Currency Trading SouthCoast 50
Southampton Bitcoin and Altcoin Meetup 50
Southampton Functional Programming Meetup 28
Southampton Virtual Reality Meetup 248
Sunderland Sunderland Digital 287
I need the top 5 but as you can see the member count doesn't seem to be ordered, i.e. 346 before 957 etc.
I've also tried sorting the values before-hand and do:
clust.sort_values(['city', 'name'], axis=0).groupby('city').head(5)
But that returns a similar series.
I've used this one too clust.groupby(['city', 'name']).head(5)
but it gives me all the rows and not top 5. It also isn't structured so not in alphabetical order.
Please help. Thanks
Upvotes: 4
Views: 7228
Reputation: 862511
I think need add ascending=[True, False]
to sort_values
and change column to members
for sorting:
clust = clust.groupby(['city','name'], as_index=False)['members'].sum()
df = clust.sort_values(['city', 'members'], ascending=[True, False]).groupby('city').head(5)
print (df)
city name members
1 Bath Agile Bath & Bristol 957
4 Bath Machine Learning Meetup 435
0 Bath AWS Bath User Group 346
3 Bath JS 142
2 Bath Crypto Chat 47
5 Belfast 4th Industrial Revolution Challenge 609
7 Belfast Azure Meetup 205
6 Belfast Adobe Meetup 66
11 Southampton Virtual Reality Meetup 248
8 Southampton Crypto Currency Trading SouthCoast 50
9 Southampton Bitcoin and Altcoin Meetup 50
10 Southampton Functional Programming Meetup 28
12 Sunderland Sunderland Digital 287
Upvotes: 7