In Python's Numpy, a dot product isn't equivalent to an einsum, and I'm not sure why not

Question

But obviously I'm doing something wrong.

I've been chasing a bug all night, and I've finally solved it. Consider:

xs = np.arange(100 * 3).reshape(100, 3)
W = np.arange(3 * 17).reshape(3, 17)

a = np.einsum('df, hg -> dg', xs, W)
b = np.dot(xs, W)

In the above a != b.

The issue I discovered was in the einsum, I say df, hg -> dg, but if I instead swap out that h for an f, it works as expected:

a = np.einsum('df, fg -> dg', xs, W)
b = np.dot(xs, W)

Now, a == b.

What is the summation doing differently in both cases, I'd expect them to be the same?

xnx · Accepted Answer

The correct way to do the matrix multiplication using np.einsum is to repeat the "middle" index (indicating summation over row times column), as you found:

a = np.array([[1,2],[3,4]])
b = np.array([[1,-2],[-0.4,3]])
np.einsum('df,fg->dg', a, b)
array([[ 0.2,  4. ],
       [ 1.4,  6. ]])

a.dot(b) 
array([[ 0.2,  4. ],
       [ 1.4,  6. ]])

If you don't, you get each value of a multiplied by b:

np.einsum('df, hg -> dfhg', a, b)

array([[[[  1. ,  -2. ],
         [ -0.4,   3. ]],

        [[  2. ,  -4. ],
         [ -0.8,   6. ]]],


       [[[  3. ,  -6. ],
         [ -1.2,   9. ]],

        [[  4. ,  -8. ],
         [ -1.6,  12. ]]]])

is the same as

a[:,:, None, None] * b

When you omit the middle indices in your use of the explicit operator ->, you sum over these axes:

np.einsum('df, hg -> dg', a, b)

array([[ 1.8,  3. ],
       [ 4.2,  7. ]])

is the same as:

np.einsum('df, hg -> dfhg', a, b).sum(axis=1).sum(axis=1)

Here is a good guide to einsum (not mine).

In Python's Numpy, a dot product isn't equivalent to an einsum, and I'm not sure why not

Answers (2)

Related Questions

In Python&#39;s Numpy, a dot product isn&#39;t equivalent to an einsum, and I&#39;m not sure why not

Answers (2)

Related Questions

In Python's Numpy, a dot product isn't equivalent to an einsum, and I'm not sure why not