Reputation: 2062
I am trying to compute the dot product of two numpy arrays sized respectively (162225, 10000) and (10000, 100). However, if I call numpy.dot(A, B) a MemoryError happens. I, then, tried to write my implementation:
def slower_dot (A, B):
"""Low-memory implementation of dot product"""
#Assuming A and B are of the right type and size
R = np.empty([A.shape[0], B.shape[1]])
for i in range(A.shape[0]):
for j in range(B.shape[1]):
R[i,j] = np.dot(A[i,:], B[:,j])
return R
and it works just fine, but is of course very slow. Any idea of 1) what is the reason behind this behaviour and 2) how I could circumvent / solve the problem?
I am using Python 3.4.2 (64bit) and Numpy 1.9.1 on a 64bit equipped computer with 16GB of ram running Ubuntu 14.10.
Upvotes: 4
Views: 7324
Reputation: 25833
The reason you're getting a memory error is probably because numpy is trying to copy one or both arrays inside the call to dot
. For small to medium arrays this is often the most efficient option, but for large arrays you'll need to micro-manage numpy in order to avoid the memory error. Your slower_dot
function is slow largely because of the python function call overhead, which you suffer 162225 x 100 times. Here is one common way of dealing with this kind of situation when you want to balance memory and performance limitations.
import numpy as np
def chunking_dot(big_matrix, small_matrix, chunk_size=100):
# Make a copy if the array is not already contiguous
small_matrix = np.ascontiguousarray(small_matrix)
R = np.empty((big_matrix.shape[0], small_matrix.shape[1]))
for i in range(0, R.shape[0], chunk_size):
end = i + chunk_size
R[i:end] = np.dot(big_matrix[i:end], small_matrix)
return R
You'll want to pick the chunk_size that works best for your specific array sizes. Typically larger chunk sizes will be faster as long as everything fits in memory.
Upvotes: 6
Reputation: 313
I think the problem starts from the matrix A itself as a 16225 * 10000 size matrix already occupies about 12GB of memory if each element is a double precision floating point number. That together with how numpy creates temporary copies to do the dot operation will cause the error. The extra copies is because numpy uses the underlying BLAS operations for dot which needs the matrices to be stored in contiguous C order
Check out these links if you want more discussions about improving dot performance
http://wiki.scipy.org/PerformanceTips
https://github.com/numpy/numpy/pull/2730
Upvotes: 2