Oscar Muñoz
Oscar Muñoz

Reputation: 439

Product operator of n matrices without a loop

I got a matrix of dimention (nx2x2) so suposing the matrix M of (3x2x2):

[[[ 1.  5.]
  [ 2.  4.]]

 [[ 5. 25.]
  [10. 20.]]

 [[ 5. 25.]
  [10. 20.]]]

I would like to do a product operator (dot product) of each 2x2 matrix. In other words I want to do the folowing:

enter image description here

where Mj is the same as M[j,:,:] in python. The easiest way to do this is with a for loop like this:

prod=np.identity(2)
for temp in M:
    prod=np.dot(prod,temp)

but I don't know if there is a vectorized way of doing this (I am sure that this is probably a repeated question but I can't figure out the right question to google)

Upvotes: 1

Views: 186

Answers (2)

pythonweb
pythonweb

Reputation: 1073

If you just want a speedup, here is a solution (I am getting a 10x speed up on this computer)

import numpy as np
import numba as nb

big_array = np.random.rand(400, 400, 400)

def numpy_method(array3d):
    return np.linalg.multi_dot(array3d)

@nb.jit(nopython=True)
def numba_method(array3d):
    prod = np.identity(array3d.shape[2])
    for i in nb.prange(array3d.shape[0]):
        temp = array3d[i, :, :]
        prod = np.dot(prod, temp)

    return prod

numpy_result = numpy_method(big_array)
numba_result = numba_method(big_array)

print(np.all(np.isclose(numpy_result, numba_result)))

%timeit numpy_method(big_array)
9.66 s ± 142 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
%timeit numba_method(big_array)
923 ms ± 7.38 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

This does not scale very well to smaller arrays, but when the array is as large as you were saying compiling the code with the LLVM compiler does make a noticeable difference.

Also, when I make the arrays bigger they all overload available memory in the floats and turn to inf values. I'm guessing that your matrices won't run into this issue though.

Upvotes: 2

hpaulj
hpaulj

Reputation: 231385

In [99]: X = np.array([1,5,2,4,5,25,10,20,5,25,10,20]).reshape(3,2,2)
In [100]: X
Out[100]: 
array([[[ 1,  5],
        [ 2,  4]],

       [[ 5, 25],
        [10, 20]],

       [[ 5, 25],
        [10, 20]]])
In [101]: prod=np.identity(2)
In [102]: for t in X:
     ...:     prod=np.dot(prod,t)
     ...:     
In [103]: prod
Out[103]: 
array([[1525., 3875.],
       [1550., 3850.]])

With the mutmul operator:

In [104]: X[0]@X[1]@X[2]
Out[104]: 
array([[1525, 3875],
       [1550, 3850]])
In [105]: 

A chained dot function:

In [106]: np.linalg.multi_dot(X)
Out[106]: 
array([[1525, 3875],
       [1550, 3850]])

Check its docs and its code to see how it tries to optimize the calculation.

How is numpy multi_dot slower than numpy.dot?

numpy large matrix multiplication optimize

Multiply together list of matrices in Numpy


If I expand X several times:

In [147]: X.shape
Out[147]: (48, 2, 2)

The cost of one dot product:

In [148]: timeit X[0]@X[1]
5.08 µs ± 14.2 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)

And for 48 of them:

In [149]: 5*48
Out[149]: 240

Compare that to the cost of iteratively evaluating chained dot:

In [150]: %%timeit
     ...: Y = np.eye(X.shape[1]).astype(X.dtype)
     ...: for i in X:
     ...:     Y = Y@i
     ...:     
227 µs ± 5.19 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)

So the chaining does not cost any thing beyond just evaluating all the required dots.

I tried a recursive function that does nested dots, and don't get any improvements.

Unless you can, mathematically, come up with an alternative to evaluating the dots sequentially I doubt if there's room for improvement. Using something like cython to directly call the underlying BLAS functions might be the only alternative. It won't improve on individual dots, but could reduce some of the calling overhead.

Upvotes: 3

Related Questions