numpy apply_along_axis vectorisation

Question

I am trying to implement a function that takes each row in a numpy 2d array and returns me scalar result of a certain calculations. My current code looks like the following:

img = np.array([
    [0,  5,  70, 0,  0,  0 ],
    [10, 50, 4,  4,  2,  0 ],
    [50, 10, 1,  42, 40, 1 ], 
    [10, 0,  0,  6,  85, 64],
    [0,  0,  0,  1,  2,  90]]
)

def get_y(stride):
    stride_vals = stride[stride > 0]
    pix_thresh = stride_vals.max() - 1.5*stride_vals.std()
    return np.argwhere(stride>pix_thresh).mean()

np.apply_along_axis(get_y, 0, img)
>> array([ 2. ,  1. ,  0. ,  2. ,  2.5,  3.5])

It works as expected, however, performance isn't great as in real dataset there are ~2k rows and ~20-50 columns for each frame, coming 60 times a second.

Is there a way to speed-up the process, perhaps by not using np.apply_along_axis function?

Divakar · Accepted Answer

Here's one vectorized approach setting the zeros as NaN and that let's us use np.nanmax and np.nanstd to compute those max and std values avoiding the zeros, like so -

imgn = np.where(img==0, np.nan, img)
mx = np.nanmax(imgn,0) # np.max(img,0) if all are positive numbers
st = np.nanstd(imgn,0)
mask = img > mx - 1.5*st
out = np.arange(mask.shape[0]).dot(mask)/mask.sum(0)

Runtime test -

In [94]: img = np.random.randint(-100,100,(2000,50))

In [95]: %timeit np.apply_along_axis(get_y, 0, img)
100 loops, best of 3: 4.36 ms per loop

In [96]: %%timeit
    ...: imgn = np.where(img==0, np.nan, img)
    ...: mx = np.nanmax(imgn,0)
    ...: st = np.nanstd(imgn,0)
    ...: mask = img > mx - 1.5*st
    ...: out = np.arange(mask.shape[0]).dot(mask)/mask.sum(0)
1000 loops, best of 3: 1.33 ms per loop

Thus, we are seeing a 3x+ speedup.

numpy apply_along_axis vectorisation

Answers (1)

Related Questions