gerardlouw
gerardlouw

Reputation: 201

Combining element-wise and matrix multiplication with multi-dimensional arrays in NumPy

I have two multidimensional NumPy arrays, A and B, with A.shape = (K, d, N) and B.shape = (K, N, d). I would like to perform an element-wise operation over axis 0 (K), with that operation being matrix multiplication over axes 1 and 2 (d, N and N, d). So the result should be a multidimensional array C with C.shape = (K, d, d), so that C[k] = np.dot(A[k], B[k]). A naive implementation would look like this:

C = np.vstack([np.dot(A[k], B[k])[np.newaxis, :, :] for k in xrange(K)])

but this implementation is slow. A slightly faster approach looks like this:

C = np.dot(A, B)[:, :, 0, :]

which uses the default behaviour of np.dot on multidimensional arrays, giving me an array with shape (K, d, K, d). However, this approach computes the required answer K times (each of the entries along axis 2 are the same). Asymptotically it will be slower than the first approach, but the overhead is much less. I am also aware of the following approach:

from numpy.core.umath_tests import matrix_multiply
C = matrix_multiply(A, B)

but I am not guaranteed that this function will be available. My question is thus, does NumPy provide a standard way of doing this efficiently? An answer which applies to multidimensional arrays in general would be perfect, but an answer specific to only this case would be great too.

Edit: As pointed out by @Juh_, the second approach is incorrect. The correct version is:

C = np.dot(A, B).diagonal(axis1=0, axis2=2).transpose(2, 0, 1)

but the overhead added makes it slower than the first approach, even for small matrices. The last approach is winning by a long shot on all my timing tests, for small and large matrices. I'm now strongly considering using this if no better solution crops up, even if that would mean copying the numpy.core.umath_tests library (written in C) into my project.

Upvotes: 6

Views: 2909

Answers (4)

Vasyl Zacheshyhryva
Vasyl Zacheshyhryva

Reputation: 1

You can do

np.matmul(A, B)

Look at https://numpy.org/doc/stable/reference/generated/numpy.matmul.html.

Should be faster than einsum for big enough K.

Upvotes: 0

Bernardo Costa
Bernardo Costa

Reputation: 519

A very flexible, compact, and fast solution:

C = np.einsum('Kab,Kbc->Kac', A, B, optimize=True)

Confirmation:

import numpy as np
K = 10
d = 5
N = 3
A = np.random.rand(K,d,N)
B = np.random.rand(K,N,d)
C_old = np.dot(A, B).diagonal(axis1=0, axis2=2).transpose(2, 0, 1)
C_new = np.einsum('Kab,Kbc->Kac', A, B)
print(np.max(C_old-C_new))  # should be 0 or a very small number

For large multi-dimensional arrays, the optional parameter optimize=True can save you a lot of time. You can learn about einsum here:

https://ajcr.net/Basic-guide-to-einsum/

https://rockt.github.io/2018/04/30/einsum

https://numpy.org/doc/stable/reference/generated/numpy.einsum.html

Quote:

The Einstein summation convention can be used to compute many multi-dimensional, linear algebraic array operations. einsum provides a succinct way of representing these. A non-exhaustive list of these operations is:

  • Trace of an array, numpy.trace.

  • Return a diagonal, numpy.diag.

  • Array axis summations, numpy.sum.

  • Transpositions and permutations, numpy.transpose.

  • Matrix multiplication and dot product, numpy.matmul numpy.dot.

  • Vector inner and outer products, numpy.inner numpy.outer.

  • Broadcasting, element-wise and scalar multiplication, numpy.multiply.

  • Tensor contractions, numpy.tensordot.

  • Chained array operations, in efficient calculation order, numpy.einsum_path.

Upvotes: 0

Bi Rico
Bi Rico

Reputation: 25833

I have this same issue in my project. The best I've been able to come up with is, I think it's a little faster (maybe 10%) than using vstack:

K, d, N = A.shape
C = np.empty((K, d, d))
for k in xrange(K):
    C[k] = np.dot(A[k], B[k])

I'd love to see a better solution, I can't quite see how one would use tensordot to do this.

Upvotes: 1

Juh_
Juh_

Reputation: 15569

A possible solution to your problem is:

C = np.sum(A[:,:,:,np.newaxis]*B[:,np.newaxis,:,:],axis=2)

However:

  1. it is quicker than the vstack approach only if K is much bigger than d and N
  2. their might be some memory issue: in the above solution an KxdxNxd array is allocated (i.e. all possible product paires, before summing). Actually I could not test with big K,d and N as I was going out of memory.

btw, note that:

C = np.dot(A, B)[:, :, 0, :]

does not give the correct result. It got me tricked because I first checked my method by comparing the results to those given by this np.dot command.

Upvotes: 3

Related Questions