Reputation: 1131
hi I have the following Dataframe
import pandas as pd
d = {'col1': [0.02,0.12,-0.1,0-0.07,0.01]}
df = pd.DataFrame(data=d)
df['new'] = ''
df['new'].iloc[0] = 100
df
I tried to calculate (beginning in row 1) in column 'new' the previous value divided by the value of 'col1'+ 1.
For example in row one, column new: 100/(0.12+1) = 89,285
For example in row two, column new: 89,285/(-0.10+1) = 99,206 and so on
I already tried to use a lambda function - without success. Thanks for help
Upvotes: 0
Views: 1193
Reputation: 862741
I think numba is way how working with loops here if performance is important:
from numba import jit
d = {'col1': [0.02,0.12,-0.1,0-0.07,0.01]}
df = pd.DataFrame(data=d)
df.loc[0, 'new'] = 100
@jit(nopython=True)
def f(a, b):
for i in range(1, a.shape[0]):
a[i] = a[i-1] / (b[i] +1)
return a
df['new'] = f(df['new'].to_numpy(), df['col1'].to_numpy())
print (df)
col1 new
0 0.02 100.000000
1 0.12 89.285714
2 -0.10 99.206349
3 -0.07 106.673494
4 0.01 105.617321
Performance for 5000 rows:
d = {'col1': [0.02,0.12,-0.1,0-0.07,0.01]}
df = pd.DataFrame(data=d)
df = pd.concat([df] * 1000, ignore_index=True)
In [168]: %timeit df['new'] = f(df['new'].to_numpy(), df['col1'].to_numpy())
277 µs ± 11.1 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
In [169]: %%timeit
...: for i in range(1,df.shape[0]):
...: prev = df['new'].iloc[i-1]
...: df['new'].iloc[i] = prev/(df['col1'].iloc[i]+1)
...:
1.31 s ± 20.1 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
In [170]: %%timeit
...: for i_row, row in df.iloc[1:, ].iterrows():
...: df.loc[i_row, 'new'] = df.loc[i_row - 1, 'new'] / (row['col1'] + 1)
...:
2.08 s ± 93.8 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
Upvotes: 2
Reputation: 676
Try this:
df['new'].iloc[0] = 100
for i in range(1,df.shape[0]):
prev = df['new'].iloc[i-1]
df['new'].iloc[i] = prev/(df['col1'].iloc[i]+1)
Output:
col1 new
-------------------
0 0.02 100
1 0.12 89.2857
2 -0.10 99.2063
3 -0.07 106.673
4 0.01 105.617
Upvotes: 2
Reputation: 1065
I don't see any vectorized solution. Here's a sheer loop:
df['new'] = 100
for i_row, row in df.iloc[1:, ].iterrows():
df.loc[i_row, 'new'] = df.loc[i_row - 1, 'new'] / (row['col1'] + 1)
Upvotes: 0