Reputation: 588
I have a dataframe df
:
import pandas as pd
df = pd.DataFrame({'A': [1, 1, 1,2,2,2,2],
'B': [10, 0, 0,5,0,0,0],
'C': [1,1,2,2,3,3,3],
'D': [2,3,4,5,2,3,4]})
which looks like:
A B C D
0 1 10 1 2
1 1 0 1 3
2 1 0 2 4
3 2 5 2 5
4 2 0 3 2
5 2 0 3 3
6 2 0 3 4
I want to compute the value in column B
only for those locations where it is 0 for all groups (1,2 as per example data) denoted in column A
.
value of column B
= value of column B in previous record + value of col C
in same record + value of col D
in same record.
My expected output is:
A B C D
0 1 10 1 2
1 1 14 1 3
2 1 20 2 4
3 2 5 2 5
4 2 10 3 2
5 2 16 3 3
6 2 23 3 4
How can I do it in pandas ?
Upvotes: 3
Views: 74
Reputation: 323226
IIUC two cumsum
s=df[['C','D']].mask(df.B!=0)
df.groupby('A').B.cumsum()+s.groupby(df.A).cumsum().sum(1)
Out[633]:
0 10.0
1 14.0
2 20.0
3 5.0
4 10.0
5 16.0
6 23.0
dtype: float64
Upvotes: 1
Reputation: 11381
This should do it:
def f(g):
g.B = (g.B.shift() + g.C + g.D).cumsum()
return g
df.B.replace(0, df.groupby('A').apply(f).B)
The result is:
A B C D
0 1 10 1 2
1 1 14 1 3
2 1 20 2 4
3 2 5 2 5
4 2 10 3 2
5 2 16 3 3
6 2 23 3 4
Upvotes: 4