Nihar Karve
Nihar Karve

Reputation: 240

Making numpy einsum faster for multidimensional tensors

I have some code that uses the following einsum:

y = np.einsum('wxyijk,ijkd->wxyd', x, f)

where (for example) the shape of x is (64, 26, 26, 3, 3, 3) and the shape of f is (3, 3, 3, 1), both having dtype=float

%timeit np.einsum('wxyijk,ijkd->wxyd', x, f)
# 2.01 ms ± 55.6 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

This is too slow for my application, which is time critical. Neither using the GPU (via CuPy) nor path speedups (via opt-einsum) seems to make this any faster. Is there any way to make it faster natively in NumPy, or is this just about as fast as it's going to get?

Upvotes: 0

Views: 716

Answers (1)

max9111
max9111

Reputation: 6482

You could use in this case the optimize keyword, implement it yourself, or use tensordot. All but, the first version should actually do the same thing (reshaping-> dot -> reshaping).

Your implementation

x=np.random.rand(64, 26, 26, 3, 3, 3)
f=np.random.rand(3, 3, 3, 1)
%timeit y = np.einsum('wxyijk,ijkd->wxyd', x, f)
#886 µs ± 3.16 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)

With optimize="optimal"

%timeit y = np.einsum('wxyijk,ijkd->wxyd', x, f,optimize="optimal")
#275 µs ± 23.6 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)

Reshaping and BLAS-call

This normally leads to a quite comparable performance to optimize="optimal", maybe in this case a unnecessary array copy leads to the slowdown.

def contract(x,f):
    s1=x.shape
    x_=x.reshape(s1[0]*s1[1]*s1[2],s1[3]*s1[4]*s1[5])

    s2=f.shape
    f_=f.reshape(s2[0]*s2[1]*s2[2],s2[3])
    return np.dot(x_,f_).reshape(s1[0],s1[1],s1[2],s2[3])

%timeit contract(x,f)
#144 µs ± 3.09 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each)

Tensordot

%timeit np.tensordot(x,f,axes=3)
#176 µs ± 4.32 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each)

Upvotes: 1

Related Questions