konstant
konstant

Reputation: 705

What could be the best way to bypass `MemoryError` in this case?

I have two numpy arrays of pretty large size. First is arr1 of size (40, 40, 3580) and second is arr2 of size (3580, 50). What I want to achieve is

arr_final = np.sum(arr1[..., None]*arr2, axis = 2)

such that the size of arr_final is just (40, 40, 50). However, in doing the above, python probably caches internal array operations, so I keep on getting memory error. Is there any way so as to avoid internal caching and just have final result? I have looked at numexpr, but I am not sure how one can achieve arr1[..., None]*arr2, and then sum over axis=2 in numexpr. Any help or suggestion would be appreciated.

Upvotes: 1

Views: 393

Answers (1)

user2357112
user2357112

Reputation: 281843

Assuming you meant np.sum(arr1[..., None]*arr2, axis = 2), with a ... instead of a :, then that's just dot:

arr3 = arr1.dot(arr2)

This should be more efficient than explicitly materializing arr1[..., None]*arr2, but I don't know exactly what intermediates it allocates.

You can also express the computation with einsum. Again, this should be more efficient than explicitly materializing arr1[..., None]*arr2, but I don't know exactly what it allocates.

arr3 = numpy.einsum('ijk,kl', arr1, arr2)

Upvotes: 3

Related Questions