Recurrence relation in Pandas

Question

I have a DataFrame, df, in pandas with series df.A and df.B and am trying to create a third series, df.C that is dependent on A and B as well as the previous result. That is:

C[0]=A[0]

C[n]=A[n] + B[n]*C[n-1]

what is the most efficient way of doing this? Ideally, I wouldn't have to fall back to a for loop.

Edit

This is the desired output for C given A and B. Now just need to figure out how...

import pandas as pd

a = [ 2, 3,-8,-2, 1]
b = [ 1, 1, 4, 2, 1]
c = [ 2, 5,12,22,23]

df = pd.DataFrame({'A': a, 'B': b, 'C': c})
df

piRSquared · Accepted Answer

You can vectorize this with obnoxious cumulative products and zipping together of other vectors. But it won't end up saving you time. As a matter of fact, it will likely be numerically unstable.

Instead, you can use numba to speed up your loop.

from numba import njit
import numpy as np
import pandas as pd

@njit
def dynamic_alpha(a, b):
    c = a.copy()
    for i in range(1, len(a)):
        c[i] = a[i] + b[i] * c[i - 1]
    return c

df.assign(C=dynamic_alpha(df.A.values, df.B.values))

   A  B   C
0  2  1   2
1  3  1   5
2 -8  4  12
3 -2  2  22
4  1  1  23

For this simple calculation, this will be about as fast as a simple

df.assign(C=np.arange(len(df)) ** 2 + 2)

df = pd.concat([df] * 10000)
%timeit df.assign(C=dynamic_alpha(df.A.values, df.B.values))
%timeit df.assign(C=np.arange(len(df)) ** 2 + 2)

337 µs ± 5.87 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
333 µs ± 20.4 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)

Recurrence relation in Pandas

Answers (2)

Related Questions