Algorithm complexity of optimal path in pytorch einsum

Question

I want to perform the following contraction

np.einsum('ijk,kl,jlm', x, y, z, optimize = 'optimal')

Testing performance with numpy I know that for my data, the optimal path is almost allways (if this makes sense): first k then j, then l

But just in case I want to allow pytorch to find the optimal path. However reading the documentation I found the following:

This optimization occurs when there are at least three inputs, since the order does not matter otherwise. Note that finding the optimal path is an NP-hard problem, thus, opt_einsum relies on different heuristics to achieve near-optimal results

Maybe I don't understand fully what the optimal path actually is, but why does it say it's a NP-hard problem. In essence isn't it equivalent to a sorting algorithm? Or what exactly makes it scale non polynomially (number of tensors, number of contractions or total number of indices)?

Furthermore, I want to incorporate this structure for a custom layer, where my learnable parameters will be in the tensor z. In order to facillitate the backward propagation, should I separate the product in order to make the least number of operations with my learnable parameters i.e. the equivalent in pytorch to:

np.einsum('ijk,jkl', np.einsum('ijk,kl', x, y, optimize = 'optimal'), z, optimize = 'optimal')

or should I stick with the optimal path?

Algorithm complexity of optimal path in pytorch einsum

Answers (0)

Related Questions