How to add result of previous row to contents of present row?

Question

import pandas as pd
import numpy as np

df = pd.DataFrame({"a": [7, 2, 3], "b": [4, 5, 6], "c": [100, np.nan, np.NaN]})

df
Out[11]: 
   a  b      c
0  7  4  100.0
1  2  5    NaN
2  3  6    NaN

For the above Python pandas DataFrame, I would like to do for row number zero: calculate the difference between column a and b. The result of this should be added to the contents of column c and stored in a (new) column d.

For row number one, in the first step, the contents of row zero / column d should be stored in column c. Afterwards the same algorithm like before for row zero should be applied.

After performing this, the resulting DataFrame would look like:

   a  b      c     d
0  7  4  100.0   103.
1  2  5  103.0   100.
2  3  6  100.0    97.

In reality, the dataframe has much more rows than this small example. Therefore a fast computational speed is quite important.

How would a solution for computing this new dataframe look like?

sacuL · Accepted Answer

You can get the cumulative sum of the difference between a and b, add that to your column c initial value, and populate the rest of c with your newly calculated d, shifted down by 1:

df['d'] = df.a.sub(df.b).cumsum().add(df.c.iloc[0])

df.loc[1:,'c'] = df.d.shift()

>>> df
   a  b      c      d
0  7  4  100.0  103.0
1  2  5  103.0  100.0
2  3  6  100.0   97.0

How to add result of previous row to contents of present row?

Answers (1)

Related Questions