Reputation: 4624
I am trying to look for a matrix operation in numpy
that would speed up the following calculation.
I have two 3D matrices A
and B
. the first dimension indicates the example, and both of them have n_examples
examples. What I want to achieve is to dot product each example in A and B and sum the result:
import numpy as np
n_examples = 10
A = np.random.randn(n_examples, 20,30)
B = np.random.randn(n_examples, 30,5)
sum = np.zeros([20,5])
for i in range(len(A)):
sum += np.dot(A[i],B[i])
Upvotes: 3
Views: 3461
Reputation: 58915
This is a typical application for np.tensordot()
:
sum = np.tensordot(A, B, [[0,2],[0,1]])
Timing
Using the following code:
import numpy as np
n_examples = 100
A = np.random.randn(n_examples, 20,30)
B = np.random.randn(n_examples, 30,5)
def sol1():
sum = np.zeros([20,5])
for i in range(len(A)):
sum += np.dot(A[i],B[i])
return sum
def sol2():
return np.array(map(np.dot, A,B)).sum(0)
def sol3():
return np.einsum('nmk,nkj->mj',A,B)
def sol4():
return np.tensordot(A, B, [[2,0],[1,0]])
def sol5():
return np.tensordot(A, B, [[0,2],[0,1]])
Results:
timeit sol1()
1000 loops, best of 3: 1.46 ms per loop
timeit sol2()
100 loops, best of 3: 4.22 ms per loop
timeit sol3()
1000 loops, best of 3: 1.87 ms per loop
timeit sol4()
10000 loops, best of 3: 205 µs per loop
timeit sol5()
10000 loops, best of 3: 172 µs per loop
on my computer the tensordot()
was the fastest solution and changing the order that the axes are evaluated did not change the results neither the performance.
Upvotes: 4
Reputation: 54340
Ha, it can be done in just one line: np.einsum('nmk,nkj->mj',A,B)
.
See Einstein summation: http://docs.scipy.org/doc/numpy/reference/generated/numpy.einsum.html
Not the same problem but the idea is quite much the same, see discussions and alternative methods in this topic we just discussed: numpy multiply matrices preserve third axis
Don't name your variable sum
, you override the build-in sum
.
As @Jaime pointed out, the loop is actually faster for dimensions of these size. In fact a solution based on map
and sum
is, albeit simpler, even slower:
In [19]:
%%timeit
SUM = np.zeros([20,5])
for i in range(len(A)):
SUM += np.dot(A[i],B[i])
10000 loops, best of 3: 115 µs per loop
In [20]:
%timeit np.array(map(np.dot, A,B)).sum(0)
1000 loops, best of 3: 445 µs per loop
In [21]:
%timeit np.einsum('nmk,nkj->mj',A,B)
1000 loops, best of 3: 259 µs per loop
Thing are different with larger dimension:
n_examples = 1000
A = np.random.randn(n_examples, 20,1000)
B = np.random.randn(n_examples, 1000,5)
And:
In [46]:
%%timeit
SUM = np.zeros([20,5])
for i in range(len(A)):
SUM += np.dot(A[i],B[i])
1 loops, best of 3: 191 ms per loop
In [47]:
%timeit np.array(map(np.dot, A,B)).sum(0)
1 loops, best of 3: 164 ms per loop
In [48]:
%timeit np.einsum('nmk,nkj->mj',A,B)
1 loops, best of 3: 451 ms per loop
Upvotes: 2