baobobs
baobobs

Reputation: 703

Pandas - lambda function with conditional based on row index

I am trying to apply a lambda function to a dataframe by referencing three columns. I want to update one of the columns, Cumulative Total, based on the following logic:

If it's on the first row, then Cumulative Total should equal the value in Total. If it's not the first row, then apply the following formula that references the prior row:

x.shift()['Cumulative Total'] - (x.shift()['Total'] * (x.shift()['Annualized Rate'] / 1200))

I want the Cumulative Total column to look like so:

Total   Annualized Rate Cumulative Total
869     11.04718067     869
868     5.529953917     861
871     8.266360505     857
873     6.872852234     851
873     8.24742268      846
874     9.610983982     840
870     5.517241379     833
871     8.266360505     829
868     2.764976959     823

What is throwing me off is how I can determine whether or not I'm on the first row. This sounds rather trivial, but I'm very new to Pandas and am totally stumped. iloc doesn't seem to work, as it seems to only be used for grabbing a row of a given index.

The code is currently as follows:

df['Cumulative Total'] = df.apply(lambda x: x['Total'] if x.iloc[0] else x.shift()['Cumulative Total']-(x.shift()['Total']*(x.shift()['Annualized Rate']/1200)),axis=1)

The statement if x.iloc[0] is wrong. Any idea on how I can determine if it's the first row?

Edit: thank you all for your answers. Alexander's answer is on the right track, but I've noticed that the results strayed somewhat from what was to be expected. These differences became more pronounced the larger the dataframe used.

Alexander - can you address this issue with an edit to your answer? Using vanilla Python, I've arrived at the results below. The differences are largely trivial, but as stated, can get more pronounced with larger datasets.

total=(869,868,871,873,873,874,870,871,868)
rate=(11.047181,5.529954,8.266361,6.872852,8.247423,9.610984,5.517241,8.266361,2.764977)
def f(total,rate):
     cum = []
     for i in range(len(total)):
         if i == 0:
             cum.append(total[i])
         else:
             cum.append(float(cum[i-1])-(float(total[i-1])*(rate[i-1]/1200.0)))
     return cum
f(total, rate)

Returns:

869
860.9999997591667
856.9999996991667
850.99999934
845.9999995100001
839.9999992775
832.9999992641667
828.9999995391668
822.9999991800001

Upvotes: 5

Views: 9676

Answers (2)

Alexander
Alexander

Reputation: 109526

Perhaps this?

df = df.assign(
    Cumulative_Total=df['Total'].iat[0] 
                     - ((df['Total'] * df['Annualized Rate'].div(1200))
                        .shift()
                        .fillna(0)
                        .cumsum())
)

>>> df
   Total  Annualized Rate  Cumulative_Total
0    869        11.047181               869
1    868         5.529954               861
2    871         8.266361               857
3    873         6.872852               851
4    873         8.247423               846
5    874         9.610984               840
6    870         5.517241               833
7    871         8.266361               829
8    868         2.764977               823

Upvotes: 3

Julien Spronck
Julien Spronck

Reputation: 15423

Would this work? In this solution, I used x.name to get the row index.

df['Cumulative Total'] = df.apply(lambda x: x['Total'] if x.name == 0 else x.shift()['Cumulative Total']-(x.shift()['Total']*(x.shift()['Annualized Rate']/1200)),axis=1)

Upvotes: 0

Related Questions