Nir
Nir

Reputation: 601

Group-by and add new calculated column in Python

I have this DataFrame in Python(assume that it sorted):

Col1 Col2 Col3
a    1    2
a    2    3
a    4    6
b    3    7
b    5    1

and I want to add a new column named Col4 where each value is the addition of the current Col3 and the previous row Col3.

It should run this logic by grouping Col1, and if it is the first row in the group it uses only the current value Col3.

So it should look like this:

Col1 Col2 Col3 Col4
a    1    2    2 (i.e. 2 - it is the first in first row group)
a    2    3    5 (i.e. 2+3)
a    4    6    9 (i.e. 3+6)
b    3    7    7 (i.e. 7 - it is the first row in second group)   
b    5    1    8 (i.e. 7+1)

How can it be done in Python (hopefully using groupBy)?

Thanks

Upvotes: 2

Views: 1317

Answers (2)

Colonel Beauvel
Colonel Beauvel

Reputation: 31171

You can use shift per group:

df['Col4'] = df['Col3'] + df.groupby('Col1')['Col3'].shift(1).fillna(0)

>>> df
  Col1  Col2  Col3  Col4
0    a     1     2   2.0
1    a     2     3   5.0
2    a     4     6   9.0
3    b     3     7   7.0
4    b     5     1   8.0

Upvotes: 2

BENY
BENY

Reputation: 323226

By using rolling sum

df['col4']=df.groupby('Col1').Col3.rolling(2,min_periods=1).sum().values.astype(int)
df
Out[105]: 
  Col1  Col2  Col3  col4
0    a     1     2     2
1    a     2     3     5
2    a     4     6     9
3    b     3     7     7
4    b     5     1     8

Upvotes: 1

Related Questions