misheekoh
misheekoh

Reputation: 460

Dataframe groupby sort (categorical variable)

In [167]:
    df

Out[167]:
    Gender  University
0   Male    A
1   Female  B
2   Male    C
3   Male    D
4   Male    E
5   Female  A
6   Female  B
7   Female  C
8   Female  D
9   Female  E

In [168]:
df.groupby(['University','Gender'])['Gender'].size().unstack('Gender').fillna(0)

Out[168]:

enter image description here

Now, I would like to sort by Female and Male from highest to lowest so that when I bar plot, It'll be in a descending order. I tried many ways but to no avail.

In my last attempt I tried:

df.groupby(['University','Gender'])['Gender'].size().unstack('Gender').fillna(0).sort_values(ascending=False)

TypeError: sort_values() missing 1 required positional argument: 'by'

Any suggestions?

Upvotes: 1

Views: 1766

Answers (1)

jezrael
jezrael

Reputation: 862611

You can sort by one or another column:

print (df)
   Gender University
0    Male          A
1  Female          B
3    Male          D
4    Male          E
5  Female          A
2    Male          C
3    Male          D
4    Male          E
5  Female          A
6  Female          B
7  Female          C
8  Female          D
4    Male          E
5  Female          A
6  Female          B
3    Male          D
4    Male          E
5  Female          A
7  Female          C
8  Female          D
9  Female          E
df1 = df.groupby(['University','Gender'])['Gender']
        .size()
        .unstack('Gender', fill_value=0)
        .sort_values(by='Female', ascending=False)

print (df1)
Gender      Female  Male
University              
A                4     1
B                3     0
C                2     1
D                2     3
E                1     4

df1.plot.bar()

graph1

df2 = df.groupby(['University','Gender'])['Gender']
        .size()
        .unstack('Gender', fill_value=0)
        .sort_values(by='Male', ascending=False)
print (df2)
Gender      Female  Male
University              
E                1     4
D                2     3
A                4     1
C                2     1
B                3     0

df2.plot.bar()

graph2

If sort by both columns sorting of second column sort only duplicates (D, C columns):

df3 = df.groupby(['University','Gender'])['Gender']
        .size()
        .unstack('Gender', fill_value=0)
        .sort_values(by=['Female', 'Male'], ascending=False)
print (df3)

df3.plot.bar()

graph

Upvotes: 1

Related Questions