Reputation: 4398
I have a dataframe, df, where I would like to add a calculated field column and subtract from this number for every consecutive date period.
Data
base id date con retro finalc sp
100 aa q122 5 1 4 159
100 aa q222 10 1 9 50
50 bb q122 10 0 10 100
50 bb q222 5 1 4 70
100 aa q322 5 1 4 158
Desired
base id date con retro finalc sp start
100 aa q122 5 1 4 159 96
100 aa q222 10 1 9 50 87
50 bb q122 10 0 10 100 40
50 bb q222 5 1 4 70 36
100 aa q322 5 1 4 158 83
I would like the 'finalc' column to be subtracted from the 'base' column, which gives the 'start' column ex for the first entry, 'base' 100 - 'finalc' 4 = 'start' 96
What I am having trouble with is the next step. I would like to keep subtracting the finalc value from the start column, given a consecutive date for a given id.
For instance, id 'aa' has a 'start' of 96, however, in the next time period, q222, 'finalc' value needs to be subtracted from this, giving the value of 87 and so on.
The initial start begins at the earliest date per id
for aa:
100(base) - 4(finalc)
96-9 = 87
87-4 = 83
Doing
df['start']=df.eval("base-finalc")
I only have the solution for the first part. Any suggestion is appreciated.
Upvotes: 1
Views: 39
Reputation: 862751
Use GroupBy.cumsum
with subtract:
df['start'] = df['base'].sub(df.groupby('id')['finalc'].cumsum())
print (df)
base id date con retro finalc sp start
0 100 aa q122 5 1 4 159 96
1 100 aa q222 10 1 9 50 87
2 50 bb q122 10 0 10 100 40
3 50 bb q222 5 1 4 70 36
4 100 aa q322 5 1 4 158 83
Upvotes: 1