James Eaves
James Eaves

Reputation: 1647

Sort values within dataframe grouped by multiple columns

I have a dataframe that is in this form.

 Type  Major   GPA   
  F      A     2.6   
  T      B     3.4   
  T      C     2.9   
  F      A     1.8   
  T      B     2.8   
  F      C     3.5 
 ...

I'd like to group the Dataframe ("students") by Type and Major, count the number of rows for each grouping, then sort from most to least popular majors for each type, and, finally, create a new dataframe that includes the 20 most popular majors.

I'd like the output to look like this:

F   
A 21  
B 19  
C 15
...
T  
A 14  
B 7  
C 3   

This is what I did:

most_popular = students.groupby(['Type', 'Major']).size().sort_values(ascending=False)[:20]

But what this does is sort across both Types - rather than sort separately for each.

Thank you for your help.

Upvotes: 3

Views: 10573

Answers (2)

uniquegino
uniquegino

Reputation: 2005

most_popular = students.groupby(['Type', 'Major']).size().reset_index().sort_values(['Type', 'Major'], ascending=[True, False])[:20]

The key is to sort in both ASC and DSC order, you can use:

.sort_values(['Type', 'Major'], ascending=[True, False])

Upvotes: 1

Alexander
Alexander

Reputation: 109546

The results are sorted automatically as a default argument. Is this the desired output?

>>> df.groupby(['Type', 'Major'], as_index=False).GPA.count().sort_values(['Major', 'GPA'])
  Type Major  GPA
0    F     A    2
2    T     B    2
1    F     C    1
3    T     C    1

Upvotes: 0

Related Questions