nbecker
nbecker

Reputation: 1719

pandas groupby sort descending order

pandas groupby will by default sort. But I'd like to change the sort order. How can I do this?

I'm guessing that I can't apply a sort method to the returned groupby object.

Upvotes: 64

Views: 262550

Answers (8)

Hans-Peter
Hans-Peter

Reputation: 11

Depending on your needs, the simplest solution might be:

list_of_groups = list(df.groupby('group_name'))[::-1]

Upvotes: 1

data_runner
data_runner

Reputation: 73

use 'by' argument in 'sort_values' clause A generic example -'Customer Name' and 'Profit' are columns

df.groupby('Customer Name').Profit.agg(['count', 'min', 'max', 
            'mean']).sort_values(by = ['count'], ascending=False)

Upvotes: 0

callpete
callpete

Reputation: 610

Similar to one of the answers above, but try adding .sort_values() to your .groupby() will allow you to change the sort order. If you need to sort on a single column, it would look like this:

df.groupby('group')['id'].count().sort_values(ascending=False)

ascending=False will sort from high to low, the default is to sort from low to high.

*Careful with some of these aggregations. For example .size() and .count() return different values since .size() counts NaNs.

What is the difference between size and count in pandas?

Upvotes: 26

Jim Arnold
Jim Arnold

Reputation: 71

This kind of operation is covered under hierarchical indexing. Check out the examples here

When you groupby, you're making new indices. If you also pass a list through .agg(). you'll get multiple columns. I was trying to figure this out and found this thread via google.

It turns out if you pass a tuple corresponding to the exact column you want sorted on.

Try this:

# generate toy data 
ex = pd.DataFrame(np.random.randint(1,10,size=(100,3)), columns=['features', 'AUC', 'recall'])

# pass a tuple corresponding to which specific col you want sorted. In this case, 'mean' or 'AUC' alone are not unique. 
ex.groupby('features').agg(['mean','std']).sort_values(('AUC', 'mean'))

This will output a df sorted by the AUC-mean column only.

Upvotes: 6

The Unfun Cat
The Unfun Cat

Reputation: 31908

You can do a sort_values() on the dataframe before you do the groupby. Pandas preserves the ordering in the groupby.

In [44]: d.head(10)
Out[44]:
              name transcript  exon
0  ENST00000456328          2     1
1  ENST00000450305          2     1
2  ENST00000450305          2     2
3  ENST00000450305          2     3
4  ENST00000456328          2     2
5  ENST00000450305          2     4
6  ENST00000450305          2     5
7  ENST00000456328          2     3
8  ENST00000450305          2     6
9  ENST00000488147          1    11

for _, a in d.head(10).sort_values(["transcript", "exon"]).groupby(["name", "transcript"]): print(a)
              name transcript  exon
1  ENST00000450305          2     1
2  ENST00000450305          2     2
3  ENST00000450305          2     3
5  ENST00000450305          2     4
6  ENST00000450305          2     5
8  ENST00000450305          2     6
              name transcript  exon
0  ENST00000456328          2     1
4  ENST00000456328          2     2
7  ENST00000456328          2     3
              name transcript  exon
9  ENST00000488147          1    11

Upvotes: 6

Surya Chhetri
Surya Chhetri

Reputation: 11568

Other instance of preserving the order or sort by descending:

In [97]: import pandas as pd                                                                                                    

In [98]: df = pd.DataFrame({'name':['A','B','C','A','B','C','A','B','C'],'Year':[2003,2002,2001,2003,2002,2001,2003,2002,2001]})

#### Default groupby operation:
In [99]: for each in df.groupby(["Year"]): print each                                                                           
(2001,    Year name
2  2001    C
5  2001    C
8  2001    C)
(2002,    Year name
1  2002    B
4  2002    B
7  2002    B)
(2003,    Year name
0  2003    A
3  2003    A
6  2003    A)

### order preserved:
In [100]: for each in df.groupby(["Year"], sort=False): print each                                                               
(2003,    Year name
0  2003    A
3  2003    A
6  2003    A)
(2002,    Year name
1  2002    B
4  2002    B
7  2002    B)
(2001,    Year name
2  2001    C
5  2001    C
8  2001    C)

In [106]: df.groupby(["Year"], sort=False).apply(lambda x: x.sort_values(["Year"]))                        
Out[106]: 
        Year name
Year             
2003 0  2003    A
     3  2003    A
     6  2003    A
2002 1  2002    B
     4  2002    B
     7  2002    B
2001 2  2001    C
     5  2001    C
     8  2001    C

In [107]: df.groupby(["Year"], sort=False).apply(lambda x: x.sort_values(["Year"])).reset_index(drop=True)
Out[107]: 
   Year name
0  2003    A
1  2003    A
2  2003    A
3  2002    B
4  2002    B
5  2002    B
6  2001    C
7  2001    C
8  2001    C

Upvotes: 10

szeitlin
szeitlin

Reputation: 3341

Do your groupby, and use reset_index() to make it back into a DataFrame. Then sort.

grouped = df.groupby('mygroups').sum().reset_index()
grouped.sort_values('mygroups', ascending=False)

Upvotes: 69

JD Long
JD Long

Reputation: 60756

As of Pandas 0.18 one way to do this is to use the sort_index method of the grouped data.

Here's an example:

np.random.seed(1)
n=10
df = pd.DataFrame({'mygroups' : np.random.choice(['dogs','cats','cows','chickens'], size=n), 
                   'data' : np.random.randint(1000, size=n)})

grouped = df.groupby('mygroups', sort=False).sum()
grouped.sort_index(ascending=False)
print grouped

data
mygroups      
dogs      1831
chickens  1446
cats       933

As you can see, the groupby column is sorted descending now, indstead of the default which is ascending.

Upvotes: 29

Related Questions