How is numpy multi_dot slower than numpy.dot?

Question

I'm trying to optimize some code that performs lots of sequential matrix operations.

I figured numpy.linalg.multi_dot (docs here) would perform all the operations in C or BLAS and thus it would be way faster than going something like arr1.dot(arr2).dot(arr3) and so on.

I was really surprised running this code on a notebook:

v1 = np.random.rand(2,2)

v2 = np.random.rand(2,2)



%%timeit 
        
v1.dot(v2.dot(v1.dot(v2)))

The slowest run took 9.01 times longer than the fastest. This could mean that an intermediate result is being cached.
100000 loops, best of 3: 3.14 µs per loop



%%timeit        

np.linalg.multi_dot([v1,v2,v1,v2])

The slowest run took 4.67 times longer than the fastest. This could mean that an intermediate result is being cached.
10000 loops, best of 3: 32.9 µs per loop

To find out that the same operation is about 10x slower using multi_dot.

My questions are:

Am I missing something ? does it make any sense ?
Is there another way to optimize sequential matrix operations ?
Should I expect the same behavior using cython ?

How is numpy multi_dot slower than numpy.dot?

Answers (1)

Related Questions