Numpy Vectorization And Speedup

Question

I found was a small code snippet that used to be a double for loop and I managed to bring it to a single for loop with vectorization. Having dones this resulted in a drastic time improvement so I was wondering if it is possible to get rid of the second for loop here via vectorization as well and if it would improve the performance.

import numpy as np
from timeit import default_timer as timer
nlin, npix = 478, 480
bb = np.random.rand(nlin,npix)
slope = -8
fac = 4
offset= 0
barray = np.zeros([2,2259]);

timex = timer()
for y in range(nlin):
    for x in range(npix):
        ling=(np.ceil((x-y/slope)*fac)+1-offset).astype(np.int);
        barray[0,ling] +=1;
        barray[1,ling] +=bb[y,x];
newVar = np.copy(barray)
print(timer() - timex)

So the ling can be taken out of the loops by creating the following matrix

lingMat = (np.ceil((np.vstack(npixrange)-nlinrange/slope)*fac)+1-offset).astype(np.int);

which satisfies lingMat[x,y] = "ling in the for loop at x and y". And this gives a first step of the vectorization.

yacola · Accepted Answer

In terms of vectorization, you could potentially use something based on np.add.at:

def yaco_addat(bb,slope,fac,offset):
    barray = np.zeros((2,2259),dtype=np.float64)
    nlin_range = np.arange(nlin)
    npix_range = np.arange(npix)
    ling_mat = (np.ceil((npix_range-nlin_range[:,None]/slope)*fac)+1-offset).astype(np.int)  
    np.add.at(barray[0,:],ling_mat,1)
    np.add.at(barray[1,:],ling_mat,bb) 
    return barray

However, I would suggest you to optimize this directly with numba, using @jit decorator with option nopython=True, which gives you:

import numpy as np
from numba import jit

nlin, npix = 478, 480
bb = np.random.rand(nlin,npix)
slope = -8
fac = 4
offset= 0

def yaco_plain(bb,slope,fac,offset):
    barray = np.zeros((2,2259),dtype=np.float64)
    for y in range(nlin):
        for x in range(npix):
            ling=(np.ceil((x-y/slope)*fac)+1-offset).astype(np.int)
            barray[0,ling] += 1
            barray[1,ling] += bb[y,x]
    return barray

@jit(nopython=True)
def yaco_numba(bb,slope,fac,offset):
    barray = np.zeros((2,2259),dtype=np.float64)
    for y in range(nlin):
        for x in range(npix):
            ling = int((np.ceil((x-y/slope)*fac)+1-offset))
            barray[0,ling] += 1
            barray[1,ling] += bb[y,x]    
    return barray

Let's check the outputs

np.allclose(yaco_plain(bb,slope,fac,offset),yaco_addat(bb,slope,fac,offset))
>>> True
np.allclose(yaco_plain(bb,slope,fac,offset),yaco_jit(bb,slope,fac,offset))
>>> True

and now time these

%timeit yaco_plain(bb,slope,fac,offset)
>>> 648 ms ± 4.14 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

%timeit yaco_addat(bb,slope,fac,offset)
>>> 27.2 ms ± 92.3 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)

%timeit yaco_jit(bb,slope,fac,offset)
>>> 505 µs ± 995 ns per loop (mean ± std. dev. of 7 runs, 1000 loops each)

resulting in an optimized function that is way quicker than the initial 2 loops version and 53x faster than the np.add.at one. Hope this helps.

Numpy Vectorization And Speedup

Answers (1)

Related Questions