Reputation: 75
I have the sample dataframe below
perc 2018_norm
0 0.009069 27.799849
1 0.011384 0.00
2 -0.000592 0.00
3 -0.002667 0.00
The value of the first row of 2018_norm
comes from another DataFrame. I then want to calculate the value of the second row through the end of the DataFrame of the 2018_norm
column using the percentage change in the perc
column and previous row's value in 2018_norm
column, which I can currently achieve using a For Loop to give the following result:
perc 2018_norm
0 0.009069 27.799849
1 0.011384 28.116324
2 -0.000592 28.099667
3 -0.002667 28.024713
4 -0.006538 27.841490
For Loops on DataFrames are just slow so I know I am missing something basic but my google searching hasn't yielded what I am looking for.
I've tried variations of y1df['2018_norm'].iloc[1:] = (y1df['perc'] * y1df['2018_norm'].shift(1)) + y1df['2018_norm'].shift(1)
that just yield:
perc 2018_norm
0 0.009069 27.799849
1 0.011384 28.116324
2 -0.000592 0.00
3 -0.002667 0.00
4 -0.006538 0.00`
What am I missing?
EDIT: To clarify, a basic For loop and df.iloc
were not preferable and a for loop with iterrows sped the computation up substantially such that a for loop using that function is a great solution for my use. Wen-Ben's respone also directly answers the question I didn't mean to ask in my original post.
Upvotes: 2
Views: 758
Reputation: 323226
This is just cumprod
s=(df.perc.shift(-1).fillna(1)+1).cumprod().shift().fillna(1)*df['2018_norm'].iloc[0]
df['2018_norm']=s
df
Out[390]:
perc 2018_norm
0 0.009069 27.799849
1 0.011384 28.116322
2 -0.000592 28.099678
3 -0.002667 28.024736
Upvotes: 3
Reputation: 3290
You can use df.iterrows() to loop much more quickly through a pandas data frame:
for idx, row in y1df.iterrows():
if idx > 0: # Skip first row
y1df.loc[idx, '2018_norm'] = (1 + row['perc']) * y1df.loc[idx-1, '2018_norm']
print(y1df)
perc 2018_norm
0 0.009069 27.799849
1 0.011384 28.116322
2 -0.000592 28.099678
3 -0.002667 28.024736
Upvotes: 4