Reputation: 439
I got a matrix of dimention (nx2x2) so suposing the matrix M of (3x2x2):
[[[ 1. 5.]
[ 2. 4.]]
[[ 5. 25.]
[10. 20.]]
[[ 5. 25.]
[10. 20.]]]
I would like to do a product operator (dot product) of each 2x2 matrix. In other words I want to do the folowing:
where Mj is the same as M[j,:,:]
in python. The easiest way to do this is with a for loop like this:
prod=np.identity(2)
for temp in M:
prod=np.dot(prod,temp)
but I don't know if there is a vectorized way of doing this (I am sure that this is probably a repeated question but I can't figure out the right question to google)
Upvotes: 1
Views: 186
Reputation: 1073
If you just want a speedup, here is a solution (I am getting a 10x speed up on this computer)
import numpy as np
import numba as nb
big_array = np.random.rand(400, 400, 400)
def numpy_method(array3d):
return np.linalg.multi_dot(array3d)
@nb.jit(nopython=True)
def numba_method(array3d):
prod = np.identity(array3d.shape[2])
for i in nb.prange(array3d.shape[0]):
temp = array3d[i, :, :]
prod = np.dot(prod, temp)
return prod
numpy_result = numpy_method(big_array)
numba_result = numba_method(big_array)
print(np.all(np.isclose(numpy_result, numba_result)))
%timeit numpy_method(big_array)
9.66 s ± 142 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
%timeit numba_method(big_array)
923 ms ± 7.38 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
This does not scale very well to smaller arrays, but when the array is as large as you were saying compiling the code with the LLVM compiler does make a noticeable difference.
Also, when I make the arrays bigger they all overload available memory in the floats and turn to inf values. I'm guessing that your matrices won't run into this issue though.
Upvotes: 2
Reputation: 231385
In [99]: X = np.array([1,5,2,4,5,25,10,20,5,25,10,20]).reshape(3,2,2)
In [100]: X
Out[100]:
array([[[ 1, 5],
[ 2, 4]],
[[ 5, 25],
[10, 20]],
[[ 5, 25],
[10, 20]]])
In [101]: prod=np.identity(2)
In [102]: for t in X:
...: prod=np.dot(prod,t)
...:
In [103]: prod
Out[103]:
array([[1525., 3875.],
[1550., 3850.]])
With the mutmul
operator:
In [104]: X[0]@X[1]@X[2]
Out[104]:
array([[1525, 3875],
[1550, 3850]])
In [105]:
A chained dot function:
In [106]: np.linalg.multi_dot(X)
Out[106]:
array([[1525, 3875],
[1550, 3850]])
Check its docs and its code to see how it tries to optimize the calculation.
How is numpy multi_dot slower than numpy.dot?
numpy large matrix multiplication optimize
Multiply together list of matrices in Numpy
If I expand X
several times:
In [147]: X.shape
Out[147]: (48, 2, 2)
The cost of one dot product:
In [148]: timeit X[0]@X[1]
5.08 µs ± 14.2 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)
And for 48 of them:
In [149]: 5*48
Out[149]: 240
Compare that to the cost of iteratively evaluating chained dot:
In [150]: %%timeit
...: Y = np.eye(X.shape[1]).astype(X.dtype)
...: for i in X:
...: Y = Y@i
...:
227 µs ± 5.19 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
So the chaining does not cost any thing beyond just evaluating all the required dots.
I tried a recursive function that does nested dots, and don't get any improvements.
Unless you can, mathematically, come up with an alternative to evaluating the dots sequentially I doubt if there's room for improvement. Using something like cython
to directly call the underlying BLAS
functions might be the only alternative. It won't improve on individual dots, but could reduce some of the calling overhead.
Upvotes: 3