Mohamed Afiq
Mohamed Afiq

Reputation: 59

Pandas sort but maintain in group

For example i have df with such columns :

Area     Prod     Count
  A           A1          5
  A           A2          4
  B           B1          5
  B           B2         11
  B           B3         15
  C           C1          5 

I want to apply sort based on occurrences, then sort by Count For instance after sorting the df should look like this:

Area     Prod     Count
  B           B3         15
  B           B2         11
  B           B1          1
  A           A1          5
  A           A2          4
  C           C1          5 

Since B have 3 rows it is sort first, then inside B the Prod is sort by Count I tried using df.groupby('area', 'prod').sort_values(by=('Count'), ascending=False)) didn't able to solve it. Thanks

Upvotes: 0

Views: 48

Answers (1)

jezrael
jezrael

Reputation: 862541

One idea is create index by counts and then use it for sorting:

df.index = df['Area'].map(df['Area'].value_counts()).rename('idx')

print (df.sort_values(['idx','Count'], ascending=False))
    Area Prod  Count
idx                 
3      B   B3     15
3      B   B2     11
3      B   B1      5
2      A   A1      5
2      A   A2      4
1      C   C1      5

df = df.sort_values(['idx','Count'], ascending=False).reset_index(drop=True)
print (df)
  Area Prod  Count
0    B   B3     15
1    B   B2     11
2    B   B1      5
3    A   A1      5
4    A   A2      4
5    C   C1      5

But problem should be if there is multiple same counts, added D group with length 3:

print (df)
  Area Prod  Count
0    A   A1      5
1    A   A2      4
2    B   B1      5
3    B   B2     11
4    B   B3     15
5    C   C1      5
6    D   A1     50
7    D   A2     40
8    D   B1      5

Then create index by counts:

df.index = df['Area'].map(df['Area'].value_counts()).rename('idx')
print (df)
    Area Prod  Count
idx                 
2      A   A1      5
2      A   A2      4
3      B   B1      5
3      B   B2     11
3      B   B3     15
1      C   C1      5
3      D   A1     50
3      D   A2     40
3      D   B1      5

But if sorting like first solution are mixed rows:

print (df.sort_values(['idx','Count'], ascending=False))
    Area Prod  Count
idx                 
3      D   A1     50
3      D   A2     40
3      B   B3     15
3      B   B2     11
3      B   B1      5
3      D   B1      5
2      A   A1      5
2      A   A2      4
1      C   C1      5

Solution is add column Area for sorting, e.g. all descending:

print (df.sort_values(['idx','Area','Count'], ascending=False))
    Area Prod  Count
idx                 
3      D   A1     50
3      D   A2     40
3      D   B1      5
3      B   B3     15
3      B   B2     11
3      B   B1      5
2      A   A1      5
2      A   A2      4
1      C   C1      5

Or you can sort Area by ascending:

print (df.sort_values(['idx','Area','Count'], ascending=[False, True, False]))
    Area Prod  Count
idx                 
3      B   B3     15
3      B   B2     11
3      B   B1      5
3      D   A1     50
3      D   A2     40
3      D   B1      5
2      A   A1      5
2      A   A2      4
1      C   C1      5

Upvotes: 1

Related Questions