Reputation: 2139
I noticed that np.einsum
is faster when it reduces one dimension
import numpy as np
a = np.random.random((100,100,100))
b = np.random.random((100,100,100))
%timeit np.einsum('ijk,ijk->ijk',a,b)
# 100 loops, best of 3: 3.83 ms per loop
%timeit np.einsum('ijk,ijk->ij',a,b)
# 1000 loops, best of 3: 937 µs per loop
%timeit np.einsum('ijk,ijk->i',a,b)
# 1000 loops, best of 3: 921 µs per loop
%timeit np.einsum('ijk,ijk->',a,b)
# 1000 loops, best of 3: 928 µs per loop
Which seems very weird to me, as I would expect it to first generate the new array and then sum over it, which is obviously not happening. What is going on there? Why is it getting faster, when one drops one dimnesion, but does not get faster after other dimension drops?
Side note: I first thought that it has to do with creating a large array, when it has many dimensions, which I don't think is the case:
%timeit np.ones(a.shape)
# 1000 loops, best of 3: 1.79 ms per loop
%timeit np.empty(a.shape)
# 100000 loops, best of 3: 3.05 µs per loop
As creating new arrays is way faster.
Upvotes: 0
Views: 634
Reputation: 231395
einsum
is implemented in compiled code, numpy/numpy/core/src/multiarray/einsum.c.src
.
The core operation is to iterate over all dimensions (e.g. in your case 100*100*100
times) using the c
version of nditer
, applying the sum-of-products
calculation defined by the ijk
string.
But it does various optimizations, including generating views if no multiplication is required. So it will require careful study to see what's different in your cases.
The time divide is between producing a 3d output without summations, and one that sums on one or more axis.
Upvotes: 1