trevor-pope
trevor-pope

Reputation: 116

Vectorizing a for loop in Python that includes a cumsum() and iterative slicing

I have the following for loop that operates over three numpy arrays of the same length:

n = 100
a = np.random.random(n)
b = np.random.random(n)
c = np.random.random(n)

valid = np.empty(n)
for i in range(n):
    valid[i] = np.any(a[i] > b[i:] + c[i:].cumsum())

Is there a way to replace this for loop with some vectorized numpy operations?

For example, because I only care if a[i] is larger than any value in b[i:], I can do np.minimum.accumulate(b[::-1])[::-1] which gets the smallest value of b at every index and onwards, and then compare it to a like this:

a > np.minimum.accumulate(b[::-1])[::-1]

but I still would need a way to vectorize the c[i:].cumsum() into a single array calculation.

Upvotes: 2

Views: 502

Answers (2)

Roim
Roim

Reputation: 3066

I assume you want to vectorize it to decrease the running time. Since you are only using pure NumPy operations, you can use numba: see 5 Minutes Guide to Numba

it will look something like this:

import numba

@numba.njit()
def valid_for_single_idx(idx, a, b, c):
    return np.any(a[idx] > b[idx:] + c[idx:].cumsum())


valid = np.empty(n)
for i in range(n):
    valid[i] = valid_for_single_idx(i, a, b, c)

So far it isn't really vectorization (as the loop still happens), but it translate the the numpy line into llvm, so it happens as fast as probably possible.

Although it's not increasing the speed, but looks a bit nicer, you can use .map:

import numba
from functools import partial

@numba.njit()
def valid_for_single_idx(idx, a, b, c):
    return np.any(a[idx] > b[idx:] + c[idx:].cumsum())


valid = map(partial(valid_for_single_idx, a=a, b=b, c=c), range(n))

Upvotes: 2

Mad Physicist
Mad Physicist

Reputation: 114440

Your goal is to find the minimum of b[i:] + c[i:].cumsum() for each i. Clearly you can compare that to a directly.

You can write the elements of c[i:].cumsum() as the upper triangle of a matrix. Let's look at a toy case with n = 3:

c = [c1, c2, c3]
s1 = c.cumsum()
s0 = np.r_[0, s1[:-1]]

You can write the elements of the cumulative sum as

c1, c1 + c2, c1 + c2 + c3     s1[0:]                 s1[0:] - s0[0]
         c2,      c2 + c3  =  s1[1:] - c1         =  s1[1:] - s0[1]
                       c3     s1[2:] - (c1 + c2)     s1[2:] - s0[2]

You can use np.triu_indices to construct these sums as a raveled array:

r, c = np.triu_indices(n)
diff = s1[c] - s0[r] + b[c]

Since np.minimum is a ufunc, you can accumulate diff for each run defined by r using minimum.reduceat. The locations are given roughly by np.flatnonzero(np.diff(r)) + 1, but you can generate them faster with np.arange:

m = np.minimum.reduceat(diff, np.r_[0, np.arange(n, 1, -1).cumsum()])

So finally, you have:

valid = a > m

TL;DR

s1 = c.cumsum()
s0 = np.r_[0, s1[:-1]]
r, c = np.triu_indices(n)
valid = a > np.minimum.reduceat(s1[c] - s0[r] + b[c], np.r_[0, np.arange(n, 1, -1).cumsum()])

Upvotes: 2

Related Questions