Reputation: 703
I am trying to apply a lambda function to a dataframe by referencing three columns. I want to update one of the columns, Cumulative Total
, based on the following logic:
If it's on the first row, then Cumulative Total
should equal the value in Total
.
If it's not the first row, then apply the following formula that references the prior row:
x.shift()['Cumulative Total']
- (x.shift()['Total'] * (x.shift()['Annualized Rate'] / 1200))
I want the Cumulative Total
column to look like so:
Total Annualized Rate Cumulative Total
869 11.04718067 869
868 5.529953917 861
871 8.266360505 857
873 6.872852234 851
873 8.24742268 846
874 9.610983982 840
870 5.517241379 833
871 8.266360505 829
868 2.764976959 823
What is throwing me off is how I can determine whether or not I'm on the first row. This sounds rather trivial, but I'm very new to Pandas and am totally stumped. iloc
doesn't seem to work, as it seems to only be used for grabbing a row of a given index.
The code is currently as follows:
df['Cumulative Total'] = df.apply(lambda x: x['Total'] if x.iloc[0] else x.shift()['Cumulative Total']-(x.shift()['Total']*(x.shift()['Annualized Rate']/1200)),axis=1)
The statement if x.iloc[0]
is wrong. Any idea on how I can determine if it's the first row?
Edit: thank you all for your answers. Alexander's answer is on the right track, but I've noticed that the results strayed somewhat from what was to be expected. These differences became more pronounced the larger the dataframe used.
Alexander - can you address this issue with an edit to your answer? Using vanilla Python, I've arrived at the results below. The differences are largely trivial, but as stated, can get more pronounced with larger datasets.
total=(869,868,871,873,873,874,870,871,868)
rate=(11.047181,5.529954,8.266361,6.872852,8.247423,9.610984,5.517241,8.266361,2.764977)
def f(total,rate):
cum = []
for i in range(len(total)):
if i == 0:
cum.append(total[i])
else:
cum.append(float(cum[i-1])-(float(total[i-1])*(rate[i-1]/1200.0)))
return cum
f(total, rate)
Returns:
869
860.9999997591667
856.9999996991667
850.99999934
845.9999995100001
839.9999992775
832.9999992641667
828.9999995391668
822.9999991800001
Upvotes: 5
Views: 9676
Reputation: 109526
Perhaps this?
df = df.assign(
Cumulative_Total=df['Total'].iat[0]
- ((df['Total'] * df['Annualized Rate'].div(1200))
.shift()
.fillna(0)
.cumsum())
)
>>> df
Total Annualized Rate Cumulative_Total
0 869 11.047181 869
1 868 5.529954 861
2 871 8.266361 857
3 873 6.872852 851
4 873 8.247423 846
5 874 9.610984 840
6 870 5.517241 833
7 871 8.266361 829
8 868 2.764977 823
Upvotes: 3
Reputation: 15423
Would this work? In this solution, I used x.name
to get the row index.
df['Cumulative Total'] = df.apply(lambda x: x['Total'] if x.name == 0 else x.shift()['Cumulative Total']-(x.shift()['Total']*(x.shift()['Annualized Rate']/1200)),axis=1)
Upvotes: 0