saad
saad

Reputation: 1245

Why is vectorized Pinv slower than unvectorized?

I have ten 6000 x 784 matrices. If I run np.cov and then pinv on each of them, the total time is:

%timeit for n in range(10): pinv(np.cov(A,rowvar = False))
1.64 s ± 78.6 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

If I instead loop the np.cov and compute the pinv of the 3D stack, I get:

%timeit for n in range(10): dd[n] = np.cov(A,rowvar = False)
485 ms ± 18 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

%timeit pinv(dd)
4.59 s ± 369 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

Why is pinv on the 3D so slow?

What can I do to have to compute np.cov of a stack and a corresponding pinv?

Upvotes: 2

Views: 200

Answers (1)

anki
anki

Reputation: 765

This is due to the implementation. The limiting step is the numpy matmul. First I thought the SVD is the limiting step, but this scales linearily (~130 ms per iteration, 1.3 s for the pack of ten).

Here are some parts of the profiling:

1 call (your first snippet)

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
    1    0.131    0.131    0.131    0.131 linalg.py:1299(svd)
    1    0.011    0.011    0.011    0.011 {built-in method numpy.core.multiarray.matmul}
    1    0.003    0.003    0.144    0.144 <ipython-input-76-2a63f1c84429>:1(pinv)

1 call of a pack of 10 (your second snippet)

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
    1    2.952    2.952    2.952    2.952 {built-in method numpy.core.multiarray.matmul}
    1    1.265    1.265    1.265    1.265 linalg.py:1299(svd)
    1    0.026    0.026    4.266    4.266 <ipython-input-76-2a63f1c84429>:1(pinv)

Maybe there is an internal issue of the C-ordering (but recent numpy versions should take care of it), or it is just inefficient in this case... Tried briefly to change to the BLAS interface directly using the scipy implementation, but this operates on two-dimensional matrices only. These issues (that matmul is slow) has been actually raised elsewhere.

Bottom line, do it non-vectorized or try to use other implementations. Sometimes the fastet way for matrix multiplication is to use the eigen functions, but I never tried it in a multi-dimensional problem.

Upvotes: 1

Related Questions