Reputation: 17676
I want to pivot a multi-indexed datafame but fail with:
Shape of passed values is (3, 4), indices imply (3, 2)
the code:
import pandas as pd
df = pd.DataFrame({
'foo': [1,2,3], 'bar':[4,5,6], 'dt':['2020-01-01', '2020-01-01', '2020-01-02'], 'cat':['a', 'b', 'b']
})
df = df.groupby(['dt', 'cat']).describe().loc[:, pd.IndexSlice[:, ['count', '50%']]].reset_index()
columns_of_interest = sorted(df.drop(['dt', 'cat'], axis=1, level=0).columns.get_level_values(0).unique())
df.pivot(index='dt', columns='cat', values=columns_of_interest)
How can it be fixed?
Expected result:
from:
dt cat foo bar
count 50% count 50%
0 2020-01-01 a 1.0 1.0 1.0 4.0
1 2020-01-01 b 1.0 2.0 1.0 5.0
2 2020-01-02 b 1.0 3.0 1.0 6.0
to:
value foo bar
cat a b a b
dt
0
1
2
basically I want to calculate:
v = 'count'
df['foo'][v].reset_index().pivot(index='dt', columns='cat', values = v)
for each column [foo, bar]
and each aggregation [count, 50%]
and get a single combined result back.
I.e.:
for c in columns_of_interest:
print(c)
for piv in piv_values:
print(piv)
r = df[c][piv].reset_index().pivot(index='dt', columns='cat', values = piv)
display(r)
1) I am just not sure how to recombine the results yet and 2) how to find a neat solution.
A rather neat workaround is to flatten the level:
df.columns = ['_'.join(col).strip() for col in df.columns.values]
columns_of_interest = df.columns
df.reset_index().pivot(index='dt', columns='cat', values=columns_of_interest)
Upvotes: 1
Views: 114
Reputation: 29635
IIUC, you can use unstack
after the groupby
(no reset_index):
df = pd.DataFrame({
'foo': [1,2,3], 'bar':[4,5,6],
'dt':['2020-01-01', '2020-01-01', '2020-01-02'], 'cat':['a', 'b', 'b']
})
df_ = df.groupby(['dt', 'cat']).describe()\
.loc[:, pd.IndexSlice[:, ['count', '50%']]]\
.unstack() # unstack instead of reset_index
print (df_)
foo bar
count 50% count 50%
cat a b a b a b a b
dt
2020-01-01 1.0 1.0 1.0 2.0 1.0 1.0 4.0 5.0
2020-01-02 NaN 1.0 NaN 3.0 NaN 1.0 NaN 6.0
Upvotes: 1