Reputation: 1216
I have a groupby with multiple columns and the keys contain all columns which makes the output hard to read... Here's an example
import pandas as pd
import numpy as np
from pandas import Series
df = pd.DataFrame({'A': [1, 1, 2, 2],
'B': [1, 2, 2, 2],
'C': np.random.randn(4),
'D': ['one', 'two', 'three', 'four']})
def aggregate(x):
return Series(dict(C=round(x['C'].mean()), D=' '.join(x['D'])))
print(df.groupby(['A', 'B']).apply(aggregate))
C D A B 1 1 0.0 one 2 -1.0 two 2 2 -0.0 three four
How can I get 'normal' keys? Like
C D 0 0.0 one 1 -1.0 two 2 -0.0 three four
Upvotes: 1
Views: 627
Reputation: 862761
For better performance is better use DataFrameGroupBy.agg
by dictionary
, last add reset_index
with drop=True
for remove MultiIndex
:
aggregate = {'C':lambda x: round(x.mean()), 'D':' '.join}
print(df.groupby(['A', 'B']).agg(aggregate).reset_index(drop=True))
C D
0 0.0 one
1 0.0 two
2 1.0 three four
If want MultiIndex
convert to columns
there are 2 ways:
print(df.groupby(['A', 'B'], as_index=False).agg(aggregate))
Or:
print(df.groupby(['A', 'B']).agg(aggregate).reset_index())
A B C D
0 1 1 0.0 one
1 1 2 -1.0 two
2 2 2 -1.0 three four
Upvotes: 1
Reputation: 164693
You can use reset_index
and specify the optional parameter drop=True
. Note this removes your grouping key index entirely.
print(df.groupby(['A', 'B']).apply(aggregate).reset_index(drop=True))
C D
0 0 one
1 -1 two
2 0 three four
Upvotes: 1