Perform calculation based on consecutive time

Question

I have a dataframe, df, where I would like to add a calculated field column and subtract from this number for every consecutive date period.

Data

base    id  date    con retro   finalc  sp
100     aa  q122    5   1       4       159
100     aa  q222    10  1       9       50
50      bb  q122    10  0       10      100
50      bb  q222    5   1       4       70
100     aa  q322    5   1       4       158

Desired

base    id  date    con retro   finalc  sp  start
100     aa  q122    5   1       4       159 96
100     aa  q222    10  1       9       50  87
50      bb  q122    10  0       10      100 40
50      bb  q222    5   1       4       70  36
100     aa  q322    5   1       4       158 83

I would like the 'finalc' column to be subtracted from the 'base' column, which gives the 'start' column ex for the first entry, 'base' 100 - 'finalc' 4 = 'start' 96

What I am having trouble with is the next step. I would like to keep subtracting the finalc value from the start column, given a consecutive date for a given id.

For instance, id 'aa' has a 'start' of 96, however, in the next time period, q222, 'finalc' value needs to be subtracted from this, giving the value of 87 and so on.

The initial start begins at the earliest date per id for aa: 100(base) - 4(finalc)
96-9 = 87 87-4 = 83

Doing

df['start']=df.eval("base-finalc")

I only have the solution for the first part. Any suggestion is appreciated.

jezrael · Accepted Answer

Use GroupBy.cumsum with subtract:

df['start'] = df['base'].sub(df.groupby('id')['finalc'].cumsum())
print (df)
   base  id  date  con  retro  finalc   sp  start
0   100  aa  q122    5      1       4  159     96
1   100  aa  q222   10      1       9   50     87
2    50  bb  q122   10      0      10  100     40
3    50  bb  q222    5      1       4   70     36
4   100  aa  q322    5      1       4  158     83

Perform calculation based on consecutive time

Answers (1)

Related Questions