Mani
Mani

Reputation: 39

Python summary with row value to column

I have the below input

Col1  col2   col3
AA.   Wk1.   10
BB.   Wk2.   30
CC.   Wk3.   5
DD.   Wk4.   20
AA.   Wk4.   5
AA.   Wk5.   10 

I want the output using pandas or any python package as below

Col1.  Wk1 Wk2 Wk3 wk4 wk5 difflast
AA.    10.  0.  0.  5.  10.  5
BB.     0.  30. 0.  0.   0.  0
CC.     0.  0.  5.  0.   0.  0
DD.     0.  0.  0.  20.  0.  -20 

If the week is added then it should take last two week difference n output like above ... Need help

Upvotes: 1

Views: 148

Answers (3)

Trenton McKinney
Trenton McKinney

Reputation: 62523

  • Use pandas.DataFrame.pivot, which is for transforming from long to wide format.
    • If you don't want the columns named, col2 in this case, then use dfp.columns.name = None
  • Subtract the last two columns using .iloc.
    • In this case, : gets all rows
    • -1 gets the last column and -2 gets the second to last column.
  • .mean(axis=1) takes the row-wise mean. 0 are included in the calculation.
    • iloc[:, -1] gets the last column only
      • -1 is always the last column, -2 is second to last, and so on.
    • .iloc[:, 0:-1] gets all columns except the last column.
      • Just like regular slicing, when selecting a range 0:-1, the last value isn't included.
dfp = df.pivot(index='Col1', columns='col2', values='col3').fillna(0)

col2  Wk1.  Wk2.  Wk3.  Wk4.  Wk5.
Col1                              
AA.   10.0   0.0   0.0   5.0  10.0
BB.    0.0  30.0   0.0   0.0   0.0
CC.    0.0   0.0   5.0   0.0   0.0
DD.    0.0   0.0   0.0  20.0   0.0

# take difference
dfp['difflast'] = dfp.iloc[:, -1] - dfp.iloc[:, -2]

col2  Wk1.  Wk2.  Wk3.  Wk4.  Wk5.  difflast
Col1                                        
AA.   10.0   0.0   0.0   5.0  10.0       5.0
BB.    0.0  30.0   0.0   0.0   0.0       0.0
CC.    0.0   0.0   5.0   0.0   0.0       0.0
DD.    0.0   0.0   0.0  20.0   0.0     -20.0

With difflast - subtract Wk5 from average of previous weeks

dfp['last - avg'] = dfp.iloc[:, -2] - dfp.iloc[:, 0:-2].mean(axis=1)

col2  Wk1.  Wk2.  Wk3.  Wk4.  Wk5.  difflast  last - avg
Col1                                                    
AA.   10.0   0.0   0.0   5.0  10.0       5.0        6.25
BB.    0.0  30.0   0.0   0.0   0.0       0.0       -7.50
CC.    0.0   0.0   5.0   0.0   0.0       0.0       -1.25
DD.    0.0   0.0   0.0  20.0   0.0     -20.0       -5.00

Without difflast - subtract Wk5 from average of previous weeks

dfp['last - avg'] = dfp.iloc[:, -1] - dfp.iloc[:, 0:-1].mean(axis=1)

col2  Wk1.  Wk2.  Wk3.  Wk4.  Wk5.  last - avg
Col1                                          
AA.   10.0   0.0   0.0   5.0  10.0        6.25
BB.    0.0  30.0   0.0   0.0   0.0       -7.50
CC.    0.0   0.0   5.0   0.0   0.0       -1.25
DD.    0.0   0.0   0.0  20.0   0.0       -5.00

Upvotes: 2

Ch3steR
Ch3steR

Reputation: 20679

You can use df.unstack

(df.set_index(['Col1', 'col2'])['col3'].unstack(fill_value=0).
    assign(d = lambda x: x['Wk5.'] - x['Wk4.']))

col2  Wk1.  Wk2.  Wk3.  Wk4.  Wk5.   d
Col1
AA.     10     0     0     5    10   5
BB.      0    30     0     0     0   0
CC.      0     0     5     0     0   0
DD.      0     0     0    20     0 -20

Upvotes: 1

Umar.H
Umar.H

Reputation: 23099

IIUC, we can use groupby and unstack then diff() whilst selectively using list slicing to get the last two columns.

s = df.groupby(['Col1','col2']).agg('sum').unstack(1)\
      .fillna(0).droplevel(0,1)\
       .reset_index()


s['difflast'] = s.iloc[:,-2:].diff(axis=1).iloc[:,-1]

print(s)
col2 Col1  Wk1.  Wk2.  Wk3.  Wk4.  Wk5.  difflast
0     AA.  10.0   0.0   0.0   5.0  10.0       5.0
1     BB.   0.0  30.0   0.0   0.0   0.0       0.0
2     CC.   0.0   0.0   5.0   0.0   0.0       0.0
3     DD.   0.0   0.0   0.0  20.0   0.0     -20.0

Upvotes: 1

Related Questions