Reputation: 6132
I have two matrices -- A is 3033x3033, and X is 3033x20. I am running the following lines (as suggested in the answer to another question I asked):
n, d = X.shape
c = X.reshape(n, -1, d) - X.reshape(-1, n, d)
return np.tensordot(A.reshape(n, n, -1) * c, c, axes=[(0,1),(0,1)])
On the final line, Python simply stops and says "MemoryError". How can I get around this, either by changing some setting in Python or performing this operation in a more memory-efficient way?
Upvotes: 3
Views: 1048
Reputation: 1023
Here is a function that does the calculation without any for loops and without any large temporary array. See the related question for a longer answer, complete with a test script.
def fbest(A, X):
""
KA_best = np.tensordot(A.sum(1)[:,None] * X, X, axes=[(0,), (0,)])
KA_best += np.tensordot(A.sum(0)[:,None] * X, X, axes=[(0,), (0,)])
KA_best -= np.tensordot(np.dot(A, X), X, axes=[(0,), (0,)])
KA_best -= np.tensordot(X, np.dot(A, X), axes=[(0,), (0,)])
return KA_best
I profiled the code with your size arrays:
I love sp.einsum by the way. It is a great place to start when speeding up array operations by removing for loops. You can do SOOOO much with one call to sp.einsum.
The advantage of np.tensordot is that it links to whatever fast numerical library you have installed (i.e. MKL). So, tensordot will run faster and in parallel when you have the right libraries installed.
Upvotes: 3
Reputation: 221554
You can employ a two-nested loop format iterating along the last dimension of X
. Now, that last dimension is 20
, so hopefully it would still be efficient enough and more importantly leave minimum memory footprint. Here's the implementation -
n, d = X.shape
c = X.reshape(n, -1, d) - X.reshape(-1, n, d)
out = np.empty((d,d)) # d is a small number: 20
for i in range(d):
for j in range(d):
out[i,j] = (A*c[:,:,i]*(c[:,:,j])).sum()
return out
You can replace the last line with np.einsum
-
out[i,j] = np.einsum('ij->',A*c[:,:,i]*c[:,:,j])
Upvotes: 0
Reputation: 30889
If you replace the final line with
return np.einsum('ij,ijk,ijl->kl',A,c,c)
you avoid creating the A.reshape(n, n, -1) * c
(3301 by 3301 by 20) intermediate that I think is your main problem.
My impression is that the version I give is probably slower (for cases where it doesn't run out of memory), but I haven't rigourously timed it.
It's possible you could go further and avoid creating c
, but I can't immediately see how to do it. It'd be a case of following writing the whole thing in terms of sums of matrix indicies and seeing what it simplified to.
Upvotes: 2