Reputation: 6242
I have the following dataframe set:
Date Amount Total
0 2017-02-11 -11.7
1 2017-02-08 -110.7 1377.4
2 2017-02-08 -1.2
3 2017-02-07 -11.1 1489.3
4 2017-02-07 8.3 1500.4
5 2017-02-03 -5.2 1492.1
6 2017-02-01 -12.0
Since there are datasets where the total amount might be missing, I have to calculate that one myself. The best way I figured might be to search for the latest existing row, that contains a total value with something like
df = df.sort_values(by=['Date'], ascending=[False])
df[pd.notnull(df.Total)].iloc[0]
This will give me:
5 2017-02-08 -5.2 1492.1
And then use the total value 1492.1 as a base to calculate all the others. Is there an easy way of doing this, or do I have to iterate myself through the datasets calculating the totals from the previous/next dates?
EDIT: The totals should be calculated by using the x's previous total and adding all negative amounts/subtracting all positive amounts until x.
In example, to calculate the total of index 2 (2017-02-08) I would have to take 1377.4 and add 110.7 to it to receive the total of index 2, which then is 1488.1
Upvotes: 0
Views: 118
Reputation: 19947
You can use cumsum and then plus the base number. (df[::-1].Total-df[::-1].Amount.cumsum()).max() gives you the total prior to the last row. you can then add that base to the cumsum results. Also, as your data is added from bottom up, you need to reverse the rows using df[::-1] first.
df['Total_New']=df[::-1].Amount.cumsum()+(df[::-1].Total-df[::-1].Amount.cumsum()).max()
df
Out[219]:
Date Amount Total Total_New
0 2017-02-11 -11.7 NaN 1365.7
1 2017-02-08 -110.7 1377.4 1377.4
2 2017-02-08 -1.2 NaN 1488.1
3 2017-02-07 -11.1 1489.3 1489.3
4 2017-02-07 8.3 1500.4 1500.4
5 2017-02-03 -5.2 1492.1 1492.1
6 2017-02-01 -12.0 NaN 1497.3
Upvotes: 1