Reputation: 11
I'm trying to find a fast way (preferably without the use of for
loops) that mimics the following code:
# a and b are arrays of the same length
# f is function of 3 scalar variables
res = np.zeros(len(a))
temp = 1
for i in range(len(a)):
temp = f(temp, a[i], b[i])
res[i] = temp
Note that the temp
variable is updated with each iteration. This piece of code is very similar to the functionality that can be achieved with NumPy's accumulate mechanism. However, in my case there are several arrays that are used in the function f
.
The function f
can be something like
def f(temp, a_scalar, b_scalar):
return temp * a_scalar + b_scalar
Upvotes: 1
Views: 303
Reputation: 114440
As with most vectorization problems, you would have to work out the solution on a per-function basis. For the example that you present, notice that res[i] = res[i - 1] * a[i] + b[i]
can be rewritten as the following sequence:
res[0] = a[0] + b[0]
res[1] = (a[0] + b[0]) * a[1] + b[1] = a[0] * a[1] + b[0] * a[1] + b[1]
res[2] = a[0] * a[1] * a[2] + b[0] * a[1] * a[2] + b[1] * a[2] + b[2]
...
res[n] = a[0] * ... * a[n] + b[0] * (a[1] * ... an]) + b[1] * (a[2] * ... * a[n]) + ... b[n-1] * a[n] + b[n]
These terms are the cumulative sum of the product of cumprod(a[:n:-1])
and b
. You can also think of the tail of the cumulative product of a
as cumprod(a) / cumprod(a[:n]
. So we have
res[n] = prod(a[:n]) + sum(prod(a[:n]) * b[:n] / cumprod(a[:n]))
Since this relation consists of products and quotients, we can separate it into
res[n] = cumprod(a)[n] * (1 + sum(b[:n] / cumprod(a[:n])))
The remaining recursive term can also be rewritten as a cumulative sum of cumulative products:
res[n] = cumprod(a)[n] * (1 + cumsum(b / cumprod(a))[n])
The recursion has now been separated into a series of accumulations supported by numpy.
Warning
Keep in mind that the separation will be much less numerically stable than the original formulation. If this is important, use the original recursion and compile with numba, or just write the code in C.
Example
np.random.seed(0xBEEF)
a = np.random.uniform(-2, 2, size=10)
b = np.random.uniform(-3, 3, size=10)
res = np.cumprod(a) * (1 + np.cumsum(b / np.cumprod(a)))
The original implementation in the question and the code shown here produce nearly identical results:
array([-0.04044657, 2.5090314 , 0.66147071, -1.40795856, 0.93920848,
-1.75817347, -4.45795915, 2.2258663 , 2.38540097, 1.43741821])
The differences are miniscule, on the order of 1e-16 at most, caused by the loss of precision due to accumulation vs incremental updating.
Upvotes: 1