Reputation: 1309
How could I unstack this dataframes for both type and category - without creating new df and merging back on, and without creating the "apple" column using a manual sum after unstacking? Need generalisable code for a df with many fruits and categories
df
area type sale category date
0 london apple 20 green_apple 2010-01-01
1 london apple 30 red_apple 2010-01-01
2 france apple 40 green_apple 2010-01-01
3 london banana 17 yellow_banana 2010-01-01
Expected Output
green_apple yellow_banana red_apple apple banana
london 2010-01-01 20 17 30 50 17
...
france 2010-01-01 40 0 0 40 0
...
I tried
df.set_index(['date', 'area', 'type', 'category'])['sale'].unstack()
Upvotes: 2
Views: 163
Reputation: 8219
You can do the standard pivot_table
:
df.pivot_table(columns = ['type','category'], values = 'sale', index = ['area','date'])
but it gives you a multi-index table that does not look quite like what you want:
type apple banana
category green_apple red_apple yellow_banana
area date
france 2010-01-01 40.0 NaN NaN
london 2010-01-01 20.0 30.0 17.0
One solution is to do two pivot_table
s separately and munge them together -- note the aggfunc
we need to use for 'type' (and let's use for 'category' just in case)
(df.pivot_table(columns = ['type'], values = 'sale', index = ['area','date'], aggfunc = 'sum')
.join(
df.pivot_table(columns = ['category'], values = 'sale', index = ['area','date'], aggfunc = 'sum')
)
.fillna(0)
)
to get
apple banana green_apple red_apple yellow_banana
area date
france 2010-01-01 40.0 0.0 40.0 0.0 0.0
london 2010-01-01 50.0 17.0 20.0 30.0 17.0
pivot_table
... but with an extra melt
(df.melt(id_vars = ['area', 'date', 'sale'], value_vars = ['type','category'] )
.pivot_table(columns = ['value'], values = 'sale', index = ['area','date'], aggfunc = 'sum')
.fillna(0)
)
Upvotes: 2