Sorting a pandas dataframe based on number of values of a categorical column

Question

The sample dataset looks like this

col1	col2	col3
A	1	as
A	2	sd
B	3	df
C	5	fg
D	6	gh
A	1	hj
B	3	jk
B	4	kt
A	1	re
C	5	we
D	6	qw
D	7	aa

I want to sort the column col1 based on the number of occurences each item has, e.g. A has 4 occurences, B and D have 3 and C has 2 occurences. The dataframe should be sorted like A,A,A,A,B,B,B,D,D,D,C,C so that

Is there a way to achieve the same? Can I use sort_values to get desired result?

jezrael · Accepted Answer

Create helper column by Series.map with Series.value_counts and use it for sorting with col1 by DataFrame.sort_values:

df['new'] = df['col1'].map(df['col1'].value_counts())
#alternative
#df['new'] = df.groupby('col1')['col1'].transform('count')

df1 = df.sort_values(['new','col1'], ascending=[False, True]).drop('new', axis=1)

One line solution:

df1 = (df.assign(new =df['col1'].map(df['col1'].value_counts()))
         .sort_values(['new','col1'], ascending=[False, True])
         .drop('new', axis=1))

print (df1)
   col1  col2 col3
0     A     1   as
1     A     2   sd
5     A     1   hj
8     A     1   re
2     B     3   df
6     B     3   jk
7     B     4   kt
4     D     6   gh
10    D     6   qw
11    D     7   aa
3     C     5   fg
9     C     5   we

Sorting a pandas dataframe based on number of values of a categorical column

Answers (2)

Related Questions