Group-by and add new calculated column in Python

Question

I have this DataFrame in Python(assume that it sorted):

Col1 Col2 Col3
a    1    2
a    2    3
a    4    6
b    3    7
b    5    1

and I want to add a new column named Col4 where each value is the addition of the current Col3 and the previous row Col3.

It should run this logic by grouping Col1, and if it is the first row in the group it uses only the current value Col3.

So it should look like this:

Col1 Col2 Col3 Col4
a    1    2    2 (i.e. 2 - it is the first in first row group)
a    2    3    5 (i.e. 2+3)
a    4    6    9 (i.e. 3+6)
b    3    7    7 (i.e. 7 - it is the first row in second group)   
b    5    1    8 (i.e. 7+1)

How can it be done in Python (hopefully using groupBy)?

Thanks

Colonel Beauvel · Accepted Answer

You can use shift per group:

df['Col4'] = df['Col3'] + df.groupby('Col1')['Col3'].shift(1).fillna(0)

>>> df
  Col1  Col2  Col3  Col4
0    a     1     2   2.0
1    a     2     3   5.0
2    a     4     6   9.0
3    b     3     7   7.0
4    b     5     1   8.0

Group-by and add new calculated column in Python

Answers (2)

Related Questions