Reputation: 15
I have a large Numpy array and I want to apply some function elementwise if element is positive and replace with -inf otherwise. The array is basically an outer difference of two vectors (see the example if not clear), and therefore it has a particular structure: each row starts from positive numbers and at some point becomes negative, for instance, when I subtract [1,...,10] from [3,4,5] I get the matrix
2 1 0 -1 -2 -3 ...
3 2 1 0 -1 -2 ...
4 3 2 1 0 -1 ...
(of course, in the real application the vectors are not evenly space integers)
and I want to get
f(2) f(1) f(0) -inf -inf -inf ...
f(3) f(2) f(1) f(0) -inf -inf ...
f(4) f(3) f(2) f(1) f(0) -inf ...
There is a straightforward solution:
import numpy as np
x = np.arange(2000,dtype=np.float64)
y = np.arange(4000,dtype=np.float64)
M = x[:,None] - y[None,:]
i_pos = M>=0
M[i_pos] = np.sqrt(M[i_pos])
M[~i_pos] = -np.inf
It works but seems pretty inefficient as looking for positive elements does not account for the monotonicity (if traveling along a row we encountered negative number we should not go further).
More general example
x = np.arange(10000,dtype=np.float64).reshape(100,100)
y = np.arange(4000,dtype=np.float64)
M = x[:,:,None] - y[None,None,:]
i_pos = M>0
M[i_pos] = np.sqrt(M[i_pos])
M[~i_pos] = -np.inf
takes more than one second.
Is there a way to reasonably exploit this structure to get some speed improvement? I tried to work out a Numba function with a loop, however it did not seem to be much faster (1.38 sec vs 1.18 sec + compilation time).
Upvotes: 0
Views: 212
Reputation: 6492
If you use functions like sqrt, exp, sin, cos make sure you have installed Intel SVML. Also add all things you tried so far and how you got your timings.
Your Numpy solution uses temporary arrays, so there should be quite some room for improvement.
Example
import numpy as np
import numba as nb
def calc_M(x,y):
M = x[:,:,None] - y[None,None,:]
i_pos = M>0
M[i_pos] = np.sqrt(M[i_pos])
M[~i_pos] = -np.inf
return M
@nb.njit(parallel=True)
def calc_M_nb_1(x,y):
res=np.empty((x.shape[0],x.shape[1],y.shape[0]))
for i in nb.prange(x.shape[0]):
for j in range(x.shape[1]):
for k in range(y.shape[0]):
val= x[i,j]-y[k]
if val>0:
res[i,j,k]=np.sqrt(val)
else:
res[i,j,k]=-np.inf
return res
@nb.njit(parallel=True)
def calc_M_nb_2(x,y,res):
for i in nb.prange(x.shape[0]):
for j in range(x.shape[1]):
for k in range(y.shape[0]):
val= x[i,j]-y[k]
if val>0:
res[i,j,k]=np.sqrt(val)
else:
res[i,j,k]=-np.inf
return res
Timings
x = np.arange(10000,dtype=np.float64).reshape(100,100)
y = np.arange(4000,dtype=np.float64)
res=np.empty((x.shape[0],x.shape[1],y.shape[0]))
%timeit res_1=calc_M(x,y)
469 ms ± 3.42 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
%timeit res_2=calc_M_nb_1(x,y)
79.5 ms ± 671 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)
res=np.empty((x.shape[0],x.shape[1],y.shape[0]))
%timeit res_3=calc_M_nb_2(x,y,out)
25.5 ms ± 204 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)
If you compare the timings of calc_M_nb_2
and calc_M_nb_1
it can be seen, that the memory allocation takes more time than the whole calculations which is not a rare case in simple functions.
Of course it is not always possible to avoid memory allocations, but if you want to apply this functions to a lot of similar inputs (array size), you can get quite some speedup.
Upvotes: 1