Sort values within dataframe grouped by multiple columns

Question

I have a dataframe that is in this form.

 Type  Major   GPA   
  F      A     2.6   
  T      B     3.4   
  T      C     2.9   
  F      A     1.8   
  T      B     2.8   
  F      C     3.5 
 ...

I'd like to group the Dataframe ("students") by Type and Major, count the number of rows for each grouping, then sort from most to least popular majors for each type, and, finally, create a new dataframe that includes the 20 most popular majors.

I'd like the output to look like this:

F   
A 21  
B 19  
C 15
...
T  
A 14  
B 7  
C 3

This is what I did:

most_popular = students.groupby(['Type', 'Major']).size().sort_values(ascending=False)[:20]

But what this does is sort across both Types - rather than sort separately for each.

Thank you for your help.

Alexander · Accepted Answer

The results are sorted automatically as a default argument. Is this the desired output?

>>> df.groupby(['Type', 'Major'], as_index=False).GPA.count().sort_values(['Major', 'GPA'])
  Type Major  GPA
0    F     A    2
2    T     B    2
1    F     C    1
3    T     C    1

Sort values within dataframe grouped by multiple columns

Answers (2)

Related Questions