Reputation: 39
I have the below input
Col1 col2 col3
AA. Wk1. 10
BB. Wk2. 30
CC. Wk3. 5
DD. Wk4. 20
AA. Wk4. 5
AA. Wk5. 10
I want the output using pandas or any python package as below
Col1. Wk1 Wk2 Wk3 wk4 wk5 difflast
AA. 10. 0. 0. 5. 10. 5
BB. 0. 30. 0. 0. 0. 0
CC. 0. 0. 5. 0. 0. 0
DD. 0. 0. 0. 20. 0. -20
If the week is added then it should take last two week difference n output like above ... Need help
Upvotes: 1
Views: 148
Reputation: 62523
col2
in this case, then use dfp.columns.name = None
.iloc
.
:
gets all rows-1
gets the last column and -2
gets the second to last column..mean(axis=1)
takes the row-wise mean. 0
are included in the calculation.
iloc[:, -1]
gets the last column only
-1
is always the last column, -2
is second to last, and so on..iloc[:, 0:-1]
gets all columns except the last column.
0:-1
, the last value isn't included.dfp = df.pivot(index='Col1', columns='col2', values='col3').fillna(0)
col2 Wk1. Wk2. Wk3. Wk4. Wk5.
Col1
AA. 10.0 0.0 0.0 5.0 10.0
BB. 0.0 30.0 0.0 0.0 0.0
CC. 0.0 0.0 5.0 0.0 0.0
DD. 0.0 0.0 0.0 20.0 0.0
# take difference
dfp['difflast'] = dfp.iloc[:, -1] - dfp.iloc[:, -2]
col2 Wk1. Wk2. Wk3. Wk4. Wk5. difflast
Col1
AA. 10.0 0.0 0.0 5.0 10.0 5.0
BB. 0.0 30.0 0.0 0.0 0.0 0.0
CC. 0.0 0.0 5.0 0.0 0.0 0.0
DD. 0.0 0.0 0.0 20.0 0.0 -20.0
dfp['last - avg'] = dfp.iloc[:, -2] - dfp.iloc[:, 0:-2].mean(axis=1)
col2 Wk1. Wk2. Wk3. Wk4. Wk5. difflast last - avg
Col1
AA. 10.0 0.0 0.0 5.0 10.0 5.0 6.25
BB. 0.0 30.0 0.0 0.0 0.0 0.0 -7.50
CC. 0.0 0.0 5.0 0.0 0.0 0.0 -1.25
DD. 0.0 0.0 0.0 20.0 0.0 -20.0 -5.00
dfp['last - avg'] = dfp.iloc[:, -1] - dfp.iloc[:, 0:-1].mean(axis=1)
col2 Wk1. Wk2. Wk3. Wk4. Wk5. last - avg
Col1
AA. 10.0 0.0 0.0 5.0 10.0 6.25
BB. 0.0 30.0 0.0 0.0 0.0 -7.50
CC. 0.0 0.0 5.0 0.0 0.0 -1.25
DD. 0.0 0.0 0.0 20.0 0.0 -5.00
Upvotes: 2
Reputation: 20679
You can use df.unstack
(df.set_index(['Col1', 'col2'])['col3'].unstack(fill_value=0).
assign(d = lambda x: x['Wk5.'] - x['Wk4.']))
col2 Wk1. Wk2. Wk3. Wk4. Wk5. d
Col1
AA. 10 0 0 5 10 5
BB. 0 30 0 0 0 0
CC. 0 0 5 0 0 0
DD. 0 0 0 20 0 -20
Upvotes: 1
Reputation: 23099
IIUC, we can use groupby
and unstack
then diff()
whilst selectively using list slicing to get the last two columns.
s = df.groupby(['Col1','col2']).agg('sum').unstack(1)\
.fillna(0).droplevel(0,1)\
.reset_index()
s['difflast'] = s.iloc[:,-2:].diff(axis=1).iloc[:,-1]
print(s)
col2 Col1 Wk1. Wk2. Wk3. Wk4. Wk5. difflast
0 AA. 10.0 0.0 0.0 5.0 10.0 5.0
1 BB. 0.0 30.0 0.0 0.0 0.0 0.0
2 CC. 0.0 0.0 5.0 0.0 0.0 0.0
3 DD. 0.0 0.0 0.0 20.0 0.0 -20.0
Upvotes: 1