asd
asd

Reputation: 1309

Unstack on two columns

How could I unstack this dataframes for both type and category - without creating new df and merging back on, and without creating the "apple" column using a manual sum after unstacking? Need generalisable code for a df with many fruits and categories

df
        area    type  sale       category         date

0     london  apple  20         green_apple      2010-01-01
1     london  apple  30         red_apple        2010-01-01
2     france  apple  40         green_apple      2010-01-01
3     london  banana 17        yellow_banana     2010-01-01

Expected Output

                          green_apple   yellow_banana    red_apple  apple       banana
london   2010-01-01           20            17              30        50          17
         ...
france   2010-01-01           40           0              0         40            0
...

I tried

df.set_index(['date', 'area', 'type', 'category'])['sale'].unstack()

Upvotes: 2

Views: 163

Answers (1)

piterbarg
piterbarg

Reputation: 8219

You can do the standard pivot_table:

df.pivot_table(columns = ['type','category'], values = 'sale', index = ['area','date'])

but it gives you a multi-index table that does not look quite like what you want:

    type            apple                   banana
category            green_apple red_apple   yellow_banana
area    date            
france  2010-01-01  40.0        NaN         NaN
london  2010-01-01  20.0        30.0        17.0

One solution is to do two pivot_tables separately and munge them together -- note the aggfunc we need to use for 'type' (and let's use for 'category' just in case)

(df.pivot_table(columns = ['type'], values = 'sale', index = ['area','date'], aggfunc = 'sum')
   .join(
       df.pivot_table(columns = ['category'], values = 'sale', index = ['area','date'], aggfunc = 'sum')
   )
   .fillna(0)
)

to get

                    apple   banana  green_apple red_apple   yellow_banana
area    date                    
france  2010-01-01  40.0    0.0     40.0        0.0         0.0
london  2010-01-01  50.0    17.0    20.0        30.0        17.0

edit: solution with a single pivot_table

... but with an extra melt

(df.melt(id_vars = ['area', 'date', 'sale'], value_vars = ['type','category'] )
   .pivot_table(columns = ['value'], values = 'sale', index = ['area','date'], aggfunc = 'sum')
   .fillna(0)
)

Upvotes: 2

Related Questions