Egor
Egor

Reputation: 15

Exploiting Monotonicity in Numpy Array

I have a large Numpy array and I want to apply some function elementwise if element is positive and replace with -inf otherwise. The array is basically an outer difference of two vectors (see the example if not clear), and therefore it has a particular structure: each row starts from positive numbers and at some point becomes negative, for instance, when I subtract [1,...,10] from [3,4,5] I get the matrix

2  1  0 -1 -2 -3 ...
3  2  1  0 -1 -2 ...
4  3  2  1  0 -1 ...

(of course, in the real application the vectors are not evenly space integers)

and I want to get

f(2)  f(1)  f(0)  -inf -inf -inf ...
f(3)  f(2)  f(1)  f(0) -inf -inf ...
f(4)  f(3)  f(2)  f(1) f(0) -inf ...

There is a straightforward solution:

import numpy as np

x = np.arange(2000,dtype=np.float64)
y = np.arange(4000,dtype=np.float64)

M = x[:,None] - y[None,:]

i_pos = M>=0

M[i_pos] = np.sqrt(M[i_pos])
M[~i_pos] = -np.inf


It works but seems pretty inefficient as looking for positive elements does not account for the monotonicity (if traveling along a row we encountered negative number we should not go further).

More general example

x = np.arange(10000,dtype=np.float64).reshape(100,100)
y = np.arange(4000,dtype=np.float64)

M = x[:,:,None] - y[None,None,:]

i_pos = M>0

M[i_pos] = np.sqrt(M[i_pos])
M[~i_pos] = -np.inf

takes more than one second.

Is there a way to reasonably exploit this structure to get some speed improvement? I tried to work out a Numba function with a loop, however it did not seem to be much faster (1.38 sec vs 1.18 sec + compilation time).

Upvotes: 0

Views: 212

Answers (1)

max9111
max9111

Reputation: 6492

Memory allocations are the limiting part

If you use functions like sqrt, exp, sin, cos make sure you have installed Intel SVML. Also add all things you tried so far and how you got your timings.

Your Numpy solution uses temporary arrays, so there should be quite some room for improvement.

Example

import numpy as np
import numba as nb

def calc_M(x,y):
    M = x[:,:,None] - y[None,None,:]

    i_pos = M>0

    M[i_pos] = np.sqrt(M[i_pos])
    M[~i_pos] = -np.inf
    return M

@nb.njit(parallel=True)
def calc_M_nb_1(x,y):
    res=np.empty((x.shape[0],x.shape[1],y.shape[0]))
    for i in nb.prange(x.shape[0]):
        for j in range(x.shape[1]):
            for k in range(y.shape[0]):
                val= x[i,j]-y[k]
                if val>0:
                    res[i,j,k]=np.sqrt(val)
                else:
                    res[i,j,k]=-np.inf
    return res

@nb.njit(parallel=True)
def calc_M_nb_2(x,y,res):
    for i in nb.prange(x.shape[0]):
        for j in range(x.shape[1]):
            for k in range(y.shape[0]):
                val= x[i,j]-y[k]
                if val>0:
                    res[i,j,k]=np.sqrt(val)
                else:
                    res[i,j,k]=-np.inf
    return res

Timings

x = np.arange(10000,dtype=np.float64).reshape(100,100)
y = np.arange(4000,dtype=np.float64)
res=np.empty((x.shape[0],x.shape[1],y.shape[0]))

%timeit res_1=calc_M(x,y)
469 ms ± 3.42 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
%timeit res_2=calc_M_nb_1(x,y)
79.5 ms ± 671 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)
res=np.empty((x.shape[0],x.shape[1],y.shape[0]))
%timeit res_3=calc_M_nb_2(x,y,out)
25.5 ms ± 204 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)

If you compare the timings of calc_M_nb_2 and calc_M_nb_1 it can be seen, that the memory allocation takes more time than the whole calculations which is not a rare case in simple functions. Of course it is not always possible to avoid memory allocations, but if you want to apply this functions to a lot of similar inputs (array size), you can get quite some speedup.

Upvotes: 1

Related Questions