numpy.dot - MemoryError, my_dot - very slow, but works. Why?

Question

I am trying to compute the dot product of two numpy arrays sized respectively (162225, 10000) and (10000, 100). However, if I call numpy.dot(A, B) a MemoryError happens. I, then, tried to write my implementation:

def slower_dot (A, B):
    """Low-memory implementation of dot product"""
    #Assuming A and B are of the right type and size
    R = np.empty([A.shape[0], B.shape[1]])
    for i in range(A.shape[0]):
        for j in range(B.shape[1]):
            R[i,j] = np.dot(A[i,:], B[:,j])
    return R

and it works just fine, but is of course very slow. Any idea of 1) what is the reason behind this behaviour and 2) how I could circumvent / solve the problem?

I am using Python 3.4.2 (64bit) and Numpy 1.9.1 on a 64bit equipped computer with 16GB of ram running Ubuntu 14.10.

neiht · Accepted Answer

I think the problem starts from the matrix A itself as a 16225 * 10000 size matrix already occupies about 12GB of memory if each element is a double precision floating point number. That together with how numpy creates temporary copies to do the dot operation will cause the error. The extra copies is because numpy uses the underlying BLAS operations for dot which needs the matrices to be stored in contiguous C order

Check out these links if you want more discussions about improving dot performance

http://wiki.scipy.org/PerformanceTips

Speeding up numpy.dot

https://github.com/numpy/numpy/pull/2730

numpy.dot -> MemoryError, my_dot -> very slow, but works. Why?

Answers (2)

Related Questions

numpy.dot -&gt; MemoryError, my_dot -&gt; very slow, but works. Why?

Answers (2)

Related Questions

numpy.dot -> MemoryError, my_dot -> very slow, but works. Why?