Georges Cunty
Georges Cunty

Reputation: 51

Python pandas dataframe add previous row values

I have a pandas dataframe that looks like this:

                     AAPL   IBM  GOOG  XOM
2011-01-10 16:00:00  1500     0     0    0
2011-01-11 16:00:00     0     0     0    0
2011-01-12 16:00:00     0     0     0    0
2011-01-13 16:00:00 -1500  4000     0    0
2011-01-14 16:00:00     0     0     0    0
2011-01-18 16:00:00     0     0     0    0

My goal is to fill the rows by adding the previous row values. The result would look like this:

                     AAPL   IBM  GOOG  XOM
2011-01-10 16:00:00  1500     0     0    0
2011-01-11 16:00:00  1500     0     0    0
2011-01-12 16:00:00  1500     0     0    0
2011-01-13 16:00:00     0  4000     0    0
2011-01-14 16:00:00     0  4000     0    0
2011-01-18 16:00:00     0  4000     0    0

I tried to iterate through the dataframe index with

    for date in df.index:

and to increment dates with

    dt_nextDate = date + dt.timedelta(days=1)

but there are gaps in the dataframe index that stand for weekends.

Can I iterate through the index from the second row to the end, refer back to the previous row and add the values?

Upvotes: 5

Views: 9993

Answers (1)

Viktor Kerkez
Viktor Kerkez

Reputation: 46596

Your example result is not the output of your example algorithm, so I'm not sure what are you exactly asking for?

The desired result you showed is a cumulative sum, which you can get using:

>>> df.cumsum()
                    AAPL   IBM  GOOG  XOM
index                                    
2011-01-1016:00:00  1500     0     0    0
2011-01-1116:00:00  1500     0     0    0
2011-01-1216:00:00  1500     0     0    0
2011-01-1316:00:00     0  4000     0    0
2011-01-1416:00:00     0  4000     0    0
2011-01-1816:00:00     0  4000     0    0

But the thing you explained you want and the algorithm you showed, are more likely to be a rolling sum with a window size equals to 2:

>>> result = pd.rolling_sum(df, 2)
>>> result
                    AAPL   IBM  GOOG  XOM
index                                    
2011-01-1016:00:00   NaN   NaN   NaN  NaN
2011-01-1116:00:00  1500     0     0    0
2011-01-1216:00:00     0     0     0    0
2011-01-1316:00:00 -1500  4000     0    0
2011-01-1416:00:00 -1500  4000     0    0
2011-01-1816:00:00     0     0     0    0

To fix the NaNs just do:

>>> result.iloc[0,:] = df.iloc[0,:]
>>> result
                    AAPL   IBM  GOOG  XOM
index                                    
2011-01-1016:00:00  1500     0     0    0
2011-01-1116:00:00  1500     0     0    0
2011-01-1216:00:00     0     0     0    0
2011-01-1316:00:00 -1500  4000     0    0
2011-01-1416:00:00 -1500  4000     0    0
2011-01-1816:00:00     0     0     0    0

Upvotes: 9

Related Questions