ymoiseev
ymoiseev

Reputation: 436

numpy apply_along_axis vectorisation

I am trying to implement a function that takes each row in a numpy 2d array and returns me scalar result of a certain calculations. My current code looks like the following:

img = np.array([
    [0,  5,  70, 0,  0,  0 ],
    [10, 50, 4,  4,  2,  0 ],
    [50, 10, 1,  42, 40, 1 ], 
    [10, 0,  0,  6,  85, 64],
    [0,  0,  0,  1,  2,  90]]
)

def get_y(stride):
    stride_vals = stride[stride > 0]
    pix_thresh = stride_vals.max() - 1.5*stride_vals.std()
    return np.argwhere(stride>pix_thresh).mean()

np.apply_along_axis(get_y, 0, img)
>> array([ 2. ,  1. ,  0. ,  2. ,  2.5,  3.5])

It works as expected, however, performance isn't great as in real dataset there are ~2k rows and ~20-50 columns for each frame, coming 60 times a second.

Is there a way to speed-up the process, perhaps by not using np.apply_along_axis function?

Upvotes: 3

Views: 1690

Answers (1)

Divakar
Divakar

Reputation: 221574

Here's one vectorized approach setting the zeros as NaN and that let's us use np.nanmax and np.nanstd to compute those max and std values avoiding the zeros, like so -

imgn = np.where(img==0, np.nan, img)
mx = np.nanmax(imgn,0) # np.max(img,0) if all are positive numbers
st = np.nanstd(imgn,0)
mask = img > mx - 1.5*st
out = np.arange(mask.shape[0]).dot(mask)/mask.sum(0)

Runtime test -

In [94]: img = np.random.randint(-100,100,(2000,50))

In [95]: %timeit np.apply_along_axis(get_y, 0, img)
100 loops, best of 3: 4.36 ms per loop

In [96]: %%timeit
    ...: imgn = np.where(img==0, np.nan, img)
    ...: mx = np.nanmax(imgn,0)
    ...: st = np.nanstd(imgn,0)
    ...: mask = img > mx - 1.5*st
    ...: out = np.arange(mask.shape[0]).dot(mask)/mask.sum(0)
1000 loops, best of 3: 1.33 ms per loop

Thus, we are seeing a 3x+ speedup.

Upvotes: 2

Related Questions