serranzau
serranzau

Reputation: 75

Sequential Calculation of Pandas Column without For Loop

I have the sample dataframe below

        perc    2018_norm
0   0.009069    27.799849
1   0.011384    0.00
2   -0.000592   0.00
3   -0.002667   0.00

The value of the first row of 2018_norm comes from another DataFrame. I then want to calculate the value of the second row through the end of the DataFrame of the 2018_norm column using the percentage change in the perc column and previous row's value in 2018_norm column, which I can currently achieve using a For Loop to give the following result:

       perc      2018_norm
0   0.009069    27.799849
1   0.011384    28.116324
2   -0.000592   28.099667
3   -0.002667   28.024713
4   -0.006538   27.841490

For Loops on DataFrames are just slow so I know I am missing something basic but my google searching hasn't yielded what I am looking for.

I've tried variations of y1df['2018_norm'].iloc[1:] = (y1df['perc'] * y1df['2018_norm'].shift(1)) + y1df['2018_norm'].shift(1) that just yield:

       perc      2018_norm
0   0.009069    27.799849
1   0.011384    28.116324
2   -0.000592   0.00
3   -0.002667   0.00
4   -0.006538   0.00`

What am I missing?

EDIT: To clarify, a basic For loop and df.iloc were not preferable and a for loop with iterrows sped the computation up substantially such that a for loop using that function is a great solution for my use. Wen-Ben's respone also directly answers the question I didn't mean to ask in my original post.

Upvotes: 2

Views: 758

Answers (2)

BENY
BENY

Reputation: 323226

This is just cumprod

s=(df.perc.shift(-1).fillna(1)+1).cumprod().shift().fillna(1)*df['2018_norm'].iloc[0]
df['2018_norm']=s
df
Out[390]: 
       perc  2018_norm
0  0.009069  27.799849
1  0.011384  28.116322
2 -0.000592  28.099678
3 -0.002667  28.024736

Upvotes: 3

Nathaniel
Nathaniel

Reputation: 3290

You can use df.iterrows() to loop much more quickly through a pandas data frame:

for idx, row in y1df.iterrows():
    if idx > 0: # Skip first row
        y1df.loc[idx, '2018_norm'] = (1 + row['perc']) * y1df.loc[idx-1, '2018_norm']

print(y1df)

           perc  2018_norm
    0  0.009069  27.799849
    1  0.011384  28.116322
    2 -0.000592  28.099678
    3 -0.002667  28.024736

Upvotes: 4

Related Questions