Niuya
Niuya

Reputation: 424

pandas pivot and group

city  sale_date    sale1   sale2  sale3 
city1  2020/07/08   100      200    300
city1  2020/07/09   200      300    400
city2  2020/07/08   111      222    333
...

I want to get bellow dataframe

               2020/07/08   2020/07/09 ...
city1  sale1   100          200
       sale2   200          300
       sale3   300          400    
city2  sale1   111          NaN
       sale2   222          NaN
       sale3   333          NaN

I tried to use pd.pivot, but it just have multiple duplicate columns on dates

If I pivot 3 times and then stack/concat them

sale1_df = df.pivot(index='city',columns='sale_date',values='sale1')
sale2_df = df.pivot(index='city',columns='sale_date',values='sale2')
sale3_df = df.pivot(index='city',columns='sale_date',values='sale3')

It might work after sorting. but is there an easier way to achieve this? I can't imaging if have more sales (eg: sale4,sale5,...)

Upvotes: 0

Views: 63

Answers (3)

BENY
BENY

Reputation: 323226

Adding stack at the end

out = df.pivot(index = 'city',columns='sale_date').stack(level=0)

Upvotes: 1

wwnde
wwnde

Reputation: 26676

Set index, stack and unstack on the column name you need the values degenerated into columns

  df.set_index(['city', 'sale_date']).stack().unstack('sale_date').reset_index().rename(columns={'level_1': 'sale_type'})

sale_date   city sale_type  2020/07/08  2020/07/09
0          city1     sale1       100.0       200.0
1          city1     sale2       200.0       300.0
2          city1     sale3       300.0       400.0
3          city2     sale1       111.0         NaN
4          city2     sale2       222.0         NaN
5          city2     sale3       333.0         NaN

Upvotes: 2

scotscotmcc
scotscotmcc

Reputation: 3113

You can accomplish this by first using df.melt() and then df.pivot_table().

Melt() does a sort of un-pivot, and so it will turn your sale1, sale2... into rows instead of columns. Then the pivot_table() will turn your 'sale_date' into columns instead of rows.

df2 = df.melt(id_vars=['city','sale_date'])
df2
city    sale_date   variable    value
0   city1   2020/07/08  sale1   100
1   city1   2020/07/09  sale1   200
2   city2   2020/07/08  sale1   111
3   city1   2020/07/08  sale2   200
4   city1   2020/07/09  sale2   300
5   city2   2020/07/08  sale2   222
6   city1   2020/07/08  sale3   300
7   city1   2020/07/09  sale3   400
8   city2   2020/07/08  sale3   333

You can set the column names for 'variable' and 'value' per the documentation here - https://pandas.pydata.org/docs/reference/api/pandas.melt.html

Then you can pivot_table() on this

df2.pivot_table(index=['city','variable'],columns='sale_date',values='value')
      sale_date 2020/07/08  2020/07/09
city    variable        
city1   sale1   100.0   200.0
        sale2   200.0   300.0
        sale3   300.0   400.0
city2   sale1   111.0   NaN
        sale2   222.0   NaN
        sale3   333.0   NaN

Upvotes: 2

Related Questions