Reputation: 1647
I have a dataframe that is in this form.
Type Major GPA
F A 2.6
T B 3.4
T C 2.9
F A 1.8
T B 2.8
F C 3.5
...
I'd like to group the Dataframe ("students") by Type
and Major
, count the number of rows for each grouping, then sort from most to least popular majors for each type, and, finally, create a new dataframe that includes the 20 most popular majors.
I'd like the output to look like this:
F
A 21
B 19
C 15
...
T
A 14
B 7
C 3
This is what I did:
most_popular = students.groupby(['Type', 'Major']).size().sort_values(ascending=False)[:20]
But what this does is sort across both Types - rather than sort separately for each.
Thank you for your help.
Upvotes: 3
Views: 10573
Reputation: 2005
most_popular = students.groupby(['Type', 'Major']).size().reset_index().sort_values(['Type', 'Major'], ascending=[True, False])[:20]
The key is to sort in both ASC and DSC order, you can use:
.sort_values(['Type', 'Major'], ascending=[True, False])
Upvotes: 1
Reputation: 109546
The results are sorted automatically as a default argument. Is this the desired output?
>>> df.groupby(['Type', 'Major'], as_index=False).GPA.count().sort_values(['Major', 'GPA'])
Type Major GPA
0 F A 2
2 T B 2
1 F C 1
3 T C 1
Upvotes: 0