Big AL
Big AL

Reputation: 543

Recurrence relation in Pandas

I have a DataFrame, df, in pandas with series df.A and df.B and am trying to create a third series, df.C that is dependent on A and B as well as the previous result. That is:

C[0]=A[0]

C[n]=A[n] + B[n]*C[n-1]

what is the most efficient way of doing this? Ideally, I wouldn't have to fall back to a for loop.


Edit

This is the desired output for C given A and B. Now just need to figure out how...

import pandas as pd

a = [ 2, 3,-8,-2, 1]
b = [ 1, 1, 4, 2, 1]
c = [ 2, 5,12,22,23]

df = pd.DataFrame({'A': a, 'B': b, 'C': c})
df

Upvotes: 5

Views: 813

Answers (2)

Stef Reyes
Stef Reyes

Reputation: 33

try this:

C[0]=A[0] C=[A[i]+B[i]*C[i-1] for i in range(1,len(A))]

very much quicker than a loop.

Upvotes: -1

piRSquared
piRSquared

Reputation: 294258

You can vectorize this with obnoxious cumulative products and zipping together of other vectors. But it won't end up saving you time. As a matter of fact, it will likely be numerically unstable.

Instead, you can use numba to speed up your loop.

from numba import njit
import numpy as np
import pandas as pd

@njit
def dynamic_alpha(a, b):
    c = a.copy()
    for i in range(1, len(a)):
        c[i] = a[i] + b[i] * c[i - 1]
    return c

df.assign(C=dynamic_alpha(df.A.values, df.B.values))

   A  B   C
0  2  1   2
1  3  1   5
2 -8  4  12
3 -2  2  22
4  1  1  23

For this simple calculation, this will be about as fast as a simple

df.assign(C=np.arange(len(df)) ** 2 + 2)

df = pd.concat([df] * 10000)
%timeit df.assign(C=dynamic_alpha(df.A.values, df.B.values))
%timeit df.assign(C=np.arange(len(df)) ** 2 + 2)

337 µs ± 5.87 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
333 µs ± 20.4 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)

Upvotes: 5

Related Questions