pandas groupby then aggregate results order not repeatable?

Question

Here's my codes:

import pandas as pd
df = pd.DataFrame({'id': ['a', 'a', 'b', 'b', 'b'],
                   'v1': [1, 2, 3 ,4, 5],
                   'v2': [7, 6, 5, 4, 3],
                   'v3': [2, 4, 6, 8, 10]})

df.groupby('id').agg({'v1': ['count', 'sum'],
                      'v2': 'mean',
                      'v3': 'sum'})

My issue is that every time I run this code (in jupyter notebook), the resulted dataframe have different column orders. Is this a bug in pandas?

My next step is to rename the resulted dataframe, but with no reproducible order, it is kinda impossible to write a reusable code to do that. How could I work around it?

Btw, I am using python 3.5 and pandas 0.23.0.

Thanks!

sacuL · Accepted Answer

I don't get different ordering when I run the code multiple times. However, if you're running into this problem, you could just name the order you want after your agg. For instance, if you wanted the order v2, v3, v1, do:

df.groupby('id',sort=False).agg({'v1': ['count', 'sum'],
                      'v2': 'mean',
                      'v3': 'sum'})[['v2','v3','v1']]

     v2  v3    v1    
   mean sum count sum
id                   
a   6.5   6     2   3
b   4.0  24     3  12

Or if you simply want the same ordering as your original dataframe:

df.groupby('id',as_index=False).agg({'v1': ['count', 'sum'],
                                     'v2': 'mean',
                                     'v3': 'sum'})[df.columns]

  id    v1       v2  v3
     count sum mean sum
0  a     2   3  6.5   6
1  b     3  12  4.0  24

but all in all, @Allolz's comment makes the most sense IMO, using sort_index(1):

df.groupby('id').agg({'v1': ['count', 'sum'],
                      'v2': 'mean',
                      'v3': 'sum'}).sort_index(1)

pandas groupby then aggregate results order not repeatable?

Answers (2)

Related Questions