Reputation: 147
I found was a small code snippet that used to be a double for loop and I managed to bring it to a single for loop with vectorization. Having dones this resulted in a drastic time improvement so I was wondering if it is possible to get rid of the second for loop here via vectorization as well and if it would improve the performance.
import numpy as np
from timeit import default_timer as timer
nlin, npix = 478, 480
bb = np.random.rand(nlin,npix)
slope = -8
fac = 4
offset= 0
barray = np.zeros([2,2259]);
timex = timer()
for y in range(nlin):
for x in range(npix):
ling=(np.ceil((x-y/slope)*fac)+1-offset).astype(np.int);
barray[0,ling] +=1;
barray[1,ling] +=bb[y,x];
newVar = np.copy(barray)
print(timer() - timex)
So the ling can be taken out of the loops by creating the following matrix
lingMat = (np.ceil((np.vstack(npixrange)-nlinrange/slope)*fac)+1-offset).astype(np.int);
which satisfies lingMat[x,y] = "ling in the for loop at x and y". And this gives a first step of the vectorization.
Upvotes: 4
Views: 1872
Reputation: 3023
In terms of vectorization, you could potentially use something based on np.add.at:
def yaco_addat(bb,slope,fac,offset):
barray = np.zeros((2,2259),dtype=np.float64)
nlin_range = np.arange(nlin)
npix_range = np.arange(npix)
ling_mat = (np.ceil((npix_range-nlin_range[:,None]/slope)*fac)+1-offset).astype(np.int)
np.add.at(barray[0,:],ling_mat,1)
np.add.at(barray[1,:],ling_mat,bb)
return barray
However, I would suggest you to optimize this directly with numba, using @jit
decorator with option nopython=True
, which gives you:
import numpy as np
from numba import jit
nlin, npix = 478, 480
bb = np.random.rand(nlin,npix)
slope = -8
fac = 4
offset= 0
def yaco_plain(bb,slope,fac,offset):
barray = np.zeros((2,2259),dtype=np.float64)
for y in range(nlin):
for x in range(npix):
ling=(np.ceil((x-y/slope)*fac)+1-offset).astype(np.int)
barray[0,ling] += 1
barray[1,ling] += bb[y,x]
return barray
@jit(nopython=True)
def yaco_numba(bb,slope,fac,offset):
barray = np.zeros((2,2259),dtype=np.float64)
for y in range(nlin):
for x in range(npix):
ling = int((np.ceil((x-y/slope)*fac)+1-offset))
barray[0,ling] += 1
barray[1,ling] += bb[y,x]
return barray
Let's check the outputs
np.allclose(yaco_plain(bb,slope,fac,offset),yaco_addat(bb,slope,fac,offset))
>>> True
np.allclose(yaco_plain(bb,slope,fac,offset),yaco_jit(bb,slope,fac,offset))
>>> True
and now time these
%timeit yaco_plain(bb,slope,fac,offset)
>>> 648 ms ± 4.14 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
%timeit yaco_addat(bb,slope,fac,offset)
>>> 27.2 ms ± 92.3 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)
%timeit yaco_jit(bb,slope,fac,offset)
>>> 505 µs ± 995 ns per loop (mean ± std. dev. of 7 runs, 1000 loops each)
resulting in an optimized function that is way quicker than the initial 2 loops version and 53x
faster than the np.add.at
one. Hope this helps.
Upvotes: 2