Reputation: 51
I have a pandas dataframe that looks like this:
AAPL IBM GOOG XOM
2011-01-10 16:00:00 1500 0 0 0
2011-01-11 16:00:00 0 0 0 0
2011-01-12 16:00:00 0 0 0 0
2011-01-13 16:00:00 -1500 4000 0 0
2011-01-14 16:00:00 0 0 0 0
2011-01-18 16:00:00 0 0 0 0
My goal is to fill the rows by adding the previous row values. The result would look like this:
AAPL IBM GOOG XOM
2011-01-10 16:00:00 1500 0 0 0
2011-01-11 16:00:00 1500 0 0 0
2011-01-12 16:00:00 1500 0 0 0
2011-01-13 16:00:00 0 4000 0 0
2011-01-14 16:00:00 0 4000 0 0
2011-01-18 16:00:00 0 4000 0 0
I tried to iterate through the dataframe index with
for date in df.index:
and to increment dates with
dt_nextDate = date + dt.timedelta(days=1)
but there are gaps in the dataframe index that stand for weekends.
Can I iterate through the index from the second row to the end, refer back to the previous row and add the values?
Upvotes: 5
Views: 9993
Reputation: 46596
Your example result is not the output of your example algorithm, so I'm not sure what are you exactly asking for?
The desired result you showed is a cumulative sum, which you can get using:
>>> df.cumsum()
AAPL IBM GOOG XOM
index
2011-01-1016:00:00 1500 0 0 0
2011-01-1116:00:00 1500 0 0 0
2011-01-1216:00:00 1500 0 0 0
2011-01-1316:00:00 0 4000 0 0
2011-01-1416:00:00 0 4000 0 0
2011-01-1816:00:00 0 4000 0 0
But the thing you explained you want and the algorithm you showed, are more likely to be a rolling sum with a window size equals to 2:
>>> result = pd.rolling_sum(df, 2)
>>> result
AAPL IBM GOOG XOM
index
2011-01-1016:00:00 NaN NaN NaN NaN
2011-01-1116:00:00 1500 0 0 0
2011-01-1216:00:00 0 0 0 0
2011-01-1316:00:00 -1500 4000 0 0
2011-01-1416:00:00 -1500 4000 0 0
2011-01-1816:00:00 0 0 0 0
To fix the NaN
s just do:
>>> result.iloc[0,:] = df.iloc[0,:]
>>> result
AAPL IBM GOOG XOM
index
2011-01-1016:00:00 1500 0 0 0
2011-01-1116:00:00 1500 0 0 0
2011-01-1216:00:00 0 0 0 0
2011-01-1316:00:00 -1500 4000 0 0
2011-01-1416:00:00 -1500 4000 0 0
2011-01-1816:00:00 0 0 0 0
Upvotes: 9