Reputation: 6296
I have 2 matrices A(shape 10x10x36) and B(shape 10x27x36). I would like to multiply the last 2 axes and sum the result along axis 0 so that the result C is of the shape 10x27. Here is currently how I do it
C = []
for i in range(A.shape[0]):
C.append(np.matmul(A[i], B[i].T))
C = np.sum(np.array(C), axis=0)
I want to achieve this in a vectorized manner but can't seem to find out how. I have checked out np.einsum but not yet sure how to apply it to achieve the result. Any help will be appreciated. Thanks!
Upvotes: 1
Views: 579
Reputation: 53029
matmul
supports stacking. You can simply do:
([email protected](0,2,1)).sum(0)
Checks (C
is generated using OP's loop):
np.allclose(([email protected](0,2,1)).sum(0),C)
# True
timeit(lambda:([email protected](0,2,1)).sum(0),number=1000)
# 0.03199950899579562
# twice as fast as original loop
Upvotes: 2
Reputation: 224
You could also try the following using list comprehension. It's a bit more concise than what you are currently using.
C=np.array([A[i] @ B.T[:,:,i] for i in range(10)]).sum(0)
Upvotes: 0
Reputation: 6485
Here the same result using np.einsum
:
r1 = np.einsum('ijk,ilk->jl', A, B)
However in my machine the for loop implementation runs almost 2x faster:
def f(A,B):
C = []
for i in range(A.shape[0]):
C.append(np.matmul(A[i], B[i].T))
return np.sum(np.array(C), axis=0)
%timeit np.einsum('ijk,ilk->jl',A,B)
102 µs ± 3.79 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
%timeit f(A,B)
57.6 µs ± 1.7 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each)
Upvotes: 3