How to avoid using a for loop using either tensors or einsum?

Question

I have the following problem at hand. F is a NumPy array of dimensions 2 X 100 X 65. I want to generate another array V whose dimensions are 2 X 2 X 65. This array V must be computed in the following way:

For each t, V[:, :, t] = F[:, :, t] @ F[:, :, t].T where .T means a matrix transpose and @ means the usual matrix multiplication. Right now, I am using the following approach:

aux_matrix = np.matmul(np.transpose(F, (2, 0 ,1)), np.transpose(F, (2, 1, 0)))
V = np.transpose(aux_matrix, (1, 2, 0))

I understand that np.tensordot and np.einsum can help with this kind of situation and make things both faster and more elegant. However, I am new to tensors and I am not used to the Einstein notation. Can someone provide some light on how to perform this computation and maybe link a reference guiding a beginner here? Thanks!

Ananda · Accepted Answer

As the solution in the comment says, the einsum equivalent of the solution would be,

np.einsum("ijk,njk->ink", F, F)

Following the rules of einsum, axis=1 and axis=0 (the axes corresponding to the labels j and k) of both the arrays are going to get element-wise multiplied. Out of this, j is missing from the final output and k is present. Which means that the axis corresponding to j will get added up in the final solution and k will stop at element-wise multiplication.

In general, if the labels are repeated in the notation, they will be element-wise multiplied, and if the label is missing from the final output, addition will take place on top of element-wise addition.

This is exactly what's happening here. F has shape (2, 100, 65). "ijk,njk->ink" will do the following in this case -

Elements of axis=0 of the first array will get operated on with every element of axis=0 of the second array. This is what i,n->in in the string will represent. Here, i=2 and n=2 and the final matrix is to have the first two dimensions given by in, hence the first two dimensions of the final array will have shape (2, 2).
The axis corresponding to j are repeated in both the arrays. Hence they will be multiplied element-wise. However, this is missing from the final output, so this result will get summed over along that direction. That is, the dimension corresponding to 100 will be missing from the final output.
Similar thing will happen to the label corresponding to k as well but that is still present in the final output, so sum-reduction will not take place. Hence the final output will have the axis corresponding to 65.

If you check the shape of the final output, it's (2, 2, 65), as expected.

I hope this clears the doubt that OP expressed in the comment.

However, it's not correct to assume that this is automatically superior to the matmul formation in terms for performance. In terms of readability, maybe. But the actual performance depends on the sizes and relative dimensions of the array, and a lot of other factors, probably.

It's worth checking if the performance changes if you add the optimize=True key into einsum since I have seen this making a massive amount of difference in performance in some situations. However, for the sizes of this array, that seems to have made things slightly worse (which might be explained by the time it takes for einsum to figure out a good way of optimising the array, which might not be justified given the relative small sizes of the array).

My rule of thumb is, if you are able to figure out a solution with matmul without using any additional for loops, stick with it if your primary concern is performance. On the other hand, if your program has a bunch of for loops, give einsum a try, with and without optimize=True. However, even in that case, there are some instances where a solution with a native for loop outperforms einsum depending on the relative dimensions of the array.

How to avoid using a for loop using either tensors or einsum?

Answers (1)

Related Questions