How to vectorize a for loop containing if-statement in numpy?

Question

I am fairly new to numpy. I have the following code, which gives me the desired result:

import numpy as np

def get_result(S,K,delS):
    res=np.zeros(S.shape[0],dtype=np.float64)
    for j in range(res.shape[0]):
        if S[j]-K>delS:
            res[j]+=np.floor((S[j]-K)/delS)
            K+=np.floor((S[j]-K)/delS)*delS
        elif S[j]-K<-delS:
            res[j]+=np.ceil((S[j]-K)/delS)
            K+=np.ceil((S[j]-K)/delS)*delS
    return res

S=np.array([1.0,1.05,1.1,1.12,1.09,1.14,1.21,1.6,1.05,1.0,0.95,0.90,0.87,0.77,0.63,0.85,0.91,0.76],dtype=np.float64)

K=1.0
delS=0.1

l=get_result(S,K,delS)

for j in range(S.shape[0]):
    print("%d	%.2f	%.0f" % (j,S[j],l[j]))

The get_result function contains a for-loop, however, and is therefore awkwardly slow for larger input vectors S. Can such a function be vectorized in numpy-syntax? Any help would be appreciated.

hpaulj · Accepted Answer

A general pattern when dealing with 2 or more conditions in an array or calculation, is to construct a boolean mask, and take one action for elements where that mask is true, and different where it is false:

res=np.zeros(S.shape[0],dtype=np.float64)
mask - S-K>delS
res[mask] = ...S[mask]
res[~mask] = ...S[~mask]

A variant is to identify the indices, with np.where(mask).

But what appears to complicate your calculation is that test keeps changing. That is, K for j+1 derives from calculation for j.

for j in range(res.shape[0]):
    if S[j]-K>delS:
        res[j]+=np.floor((S[j]-K)/delS)
        K+=np.floor((S[j]-K)/delS)*delS
    elif S[j]-K<-delS:
        res[j]+=np.ceil((S[j]-K)/delS)
        K+=np.ceil((S[j]-K)/delS)*delS

To deal with that kind of iteration we usually try to use np.cumsum or other accumulative ufunc methods.

As a general rule, numpy code is fastest and easiest when the calculation can be applied to all elements of an array (or set of arrays) 'in parallel' - that is, in a way that does not depend on the order of iteration. Then we can delegate the action to fast compiled numpy functions like array addition and multiplication. If the computation is serial in nature (the value for j depending on j-1) this becomes trickier.

So if my cursory reading is right, it isn't the if-statement that is difficult, it's the serial nature of the loop.

=========================

Playing around, I was able to remove the if (there are actually 3 subcase), but it still has to be iterative:

def get_result(S,K,delS):
    S = S/delS
    res=np.zeros(S.shape[0],dtype=np.float64)
    for j in range(res.shape[0]):
        x = S[j] - K/delS
        xa = np.floor(np.abs(x)) * np.sign(x)
        res[j] += xa
        K += xa*delS         
    return res

How to vectorize a for loop containing if-statement in numpy?

Answers (1)

Related Questions