Reputation: 4637
Here's my codes:
import pandas as pd
df = pd.DataFrame({'id': ['a', 'a', 'b', 'b', 'b'],
'v1': [1, 2, 3 ,4, 5],
'v2': [7, 6, 5, 4, 3],
'v3': [2, 4, 6, 8, 10]})
df.groupby('id').agg({'v1': ['count', 'sum'],
'v2': 'mean',
'v3': 'sum'})
My issue is that every time I run this code (in jupyter notebook), the resulted dataframe have different column orders. Is this a bug in pandas?
My next step is to rename the resulted dataframe, but with no reproducible order, it is kinda impossible to write a reusable code to do that. How could I work around it?
Btw, I am using python 3.5 and pandas 0.23.0.
Thanks!
Upvotes: 0
Views: 210
Reputation: 1314
dict
keys are not ordered in Python 3.5; this applies to the dict
s used in pd.DataFrame()
and .agg()
in your code. You could used an OrderedDict
from the collections
module in the Python standard library or you could use DataFrame.reindex()
to re-organize your columns and rows as follows:
import pandas as pd
df = pd.DataFrame({
'id': ['a', 'a', 'b', 'b', 'b'],
'v1': [1, 2, 3 ,4, 5],
'v2': [7, 6, 5, 4, 3],
'v3': [2, 4, 6, 8, 10]
})
df = df.set_index('id')
df = df.reindex(columns=['v1', 'v2', 'v3'])
df.groupby(level='id').agg({
'v1': ['count', 'sum'],
'v2': 'mean',
'v3': 'sum'
})
DataFrame.reindex()
is more general than DataFrame.sort_index()
.
Upvotes: 0
Reputation: 51335
I don't get different ordering when I run the code multiple times. However, if you're running into this problem, you could just name the order you want after your agg
. For instance, if you wanted the order v2
, v3
, v1
, do:
df.groupby('id',sort=False).agg({'v1': ['count', 'sum'],
'v2': 'mean',
'v3': 'sum'})[['v2','v3','v1']]
v2 v3 v1
mean sum count sum
id
a 6.5 6 2 3
b 4.0 24 3 12
Or if you simply want the same ordering as your original dataframe:
df.groupby('id',as_index=False).agg({'v1': ['count', 'sum'],
'v2': 'mean',
'v3': 'sum'})[df.columns]
id v1 v2 v3
count sum mean sum
0 a 2 3 6.5 6
1 b 3 12 4.0 24
but all in all, @Allolz's comment makes the most sense IMO, using sort_index(1)
:
df.groupby('id').agg({'v1': ['count', 'sum'],
'v2': 'mean',
'v3': 'sum'}).sort_index(1)
Upvotes: 3