Alex
Alex

Reputation: 1131

loop over column with lambda and calculate with values from other column

hi I have the following Dataframe

import pandas as pd
d = {'col1': [0.02,0.12,-0.1,0-0.07,0.01]}
df = pd.DataFrame(data=d)

df['new'] = ''
df['new'].iloc[0] = 100

df

I tried to calculate (beginning in row 1) in column 'new' the previous value divided by the value of 'col1'+ 1.

For example in row one, column new: 100/(0.12+1) = 89,285

For example in row two, column new: 89,285/(-0.10+1) = 99,206 and so on

I already tried to use a lambda function - without success. Thanks for help

Upvotes: 0

Views: 1193

Answers (3)

jezrael
jezrael

Reputation: 862741

I think numba is way how working with loops here if performance is important:

from numba import jit

d = {'col1': [0.02,0.12,-0.1,0-0.07,0.01]}
df = pd.DataFrame(data=d)

df.loc[0, 'new'] = 100

@jit(nopython=True)
def f(a, b):
    for i in range(1, a.shape[0]):
        a[i] = a[i-1] / (b[i] +1)
    return a

df['new'] = f(df['new'].to_numpy(), df['col1'].to_numpy())
print (df)
   col1         new
0  0.02  100.000000
1  0.12   89.285714
2 -0.10   99.206349
3 -0.07  106.673494
4  0.01  105.617321

Performance for 5000 rows:

d = {'col1': [0.02,0.12,-0.1,0-0.07,0.01]}
df = pd.DataFrame(data=d)
df = pd.concat([df] * 1000, ignore_index=True)

In [168]: %timeit df['new'] = f(df['new'].to_numpy(), df['col1'].to_numpy())
277 µs ± 11.1 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)

In [169]: %%timeit
     ...: for i in range(1,df.shape[0]):
     ...:     prev = df['new'].iloc[i-1]
     ...:     df['new'].iloc[i] = prev/(df['col1'].iloc[i]+1)
     ...:     
1.31 s ± 20.1 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

In [170]: %%timeit
     ...: for i_row, row in df.iloc[1:, ].iterrows():
     ...:     df.loc[i_row, 'new'] = df.loc[i_row - 1, 'new'] / (row['col1'] + 1)
     ...:     
2.08 s ± 93.8 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

Upvotes: 2

Himanshu
Himanshu

Reputation: 676

Try this:

df['new'].iloc[0] = 100

for i in range(1,df.shape[0]):
    prev = df['new'].iloc[i-1]
    df['new'].iloc[i] = prev/(df['col1'].iloc[i]+1)

Output:

col1        new
-------------------
0   0.02    100
1   0.12    89.2857
2   -0.10   99.2063
3   -0.07   106.673
4   0.01    105.617

Upvotes: 2

Oleg O
Oleg O

Reputation: 1065

I don't see any vectorized solution. Here's a sheer loop:

df['new'] = 100
for i_row, row in df.iloc[1:, ].iterrows():
    df.loc[i_row, 'new'] = df.loc[i_row - 1, 'new'] / (row['col1'] + 1)

Upvotes: 0

Related Questions