Reputation: 945
Hi I am running scientific computing using numpy + numba. I've realized that numpy array addition in-place is very slow... compared to matlab
here is the matlab code:
tic;
% A,B are 2-d matrices, ind may not be distinct
for ii=1:N
A(ind(ii),:) = A(ind(ii),:) + B(ii,:);
end
toc;
and here is the numpy code:
s = time.time()
# A,B are numpy.ndarray, ind may not be distinct
for k in xrange(N):
A[ind[k],:] += B[k,:];
print time.time() - s
The result shows that numpy code is 10x slower than matlab... which confuses me a lot.
Moreover, when I pull the addition out of for loop, and just compare a single matrix addition with numpy.add, numpy and matlab seem to be comparable at speed.
One factor I know is that matlab uses JIT for version>=2012a to speed up for loop, but I tried numba on python code, it still does not speed up even a bit. I think this has to do with that numba has not touched numpy.add function at all, hence the performance does not change at all.
I am guessing that matlab does some sick caching for this case, hence it beats numpy dramatically.
Any suggestion on how to speed up numpy ?
Upvotes: 4
Views: 2079
Reputation: 32521
Try
A[ind] += B[:N]
i.e. without any loop.
If ind
could have duplicate elements, you can use np.add.at
:
np.add.at(A, ind, B[:N])
Upvotes: 3
Reputation: 231690
Here'a version that uses dot matrix multiplication. It constructs a matrix of 1s and 0s from ind
.
def bar(A,B,ind):
K,M =B.shape
N,M =A.shape
I = np.zeros((N,K))
I[ind,np.arange(K)] = 1
return A+np.dot(I,B)
For a problem with sizes like K,M,N = 30,14,15
this is about 3x faster. But for larger ones like K,M,N = 300,100,150
it's a bit slower.
Upvotes: 0