Gabi
Gabi

Reputation: 453

With numpy, what is the fastest way to compute one solution to an underdetermined linear system?

With numpy, what is the fastest way to compute one solution to an underdetermined linear system? I don't care which solution the method would return, I'd be happy with any solution.

In particular, I'm dealing with a 7x7 rank-6 matrix which describes the dynamics of a physical system. I'm noticing numpy.linalg.lstsq, numpy.linalg.qr, scipy.linalg.null_space, and scipy.linalg.lu run on the full matrix are all slower on my machine than numpy.linalg.solve run on a correctly-trimmed 6x6 full-rank matrix; solve is twice as fast as lstsq (14.8 µs vs 29.1 µs).

Is there any way to speed up the computation without some horrible C LAPACK-level hacking?

Upvotes: 1

Views: 308

Answers (1)

Jérôme Richard
Jérôme Richard

Reputation: 50488

Numpy is not designed to be efficient on very small matrices. Its overheads (due to type checks, value checks, iterators, allocations, etc.) can be quite big on such matrices. In fact, dozens of microseconds is reasonable for such Numpy function call. Numba can reduce the overheads thanks to a fully compiled native code. That being said, Numba can still have a small overhead (due to the call from CPython, few type checks and allocations), but there are generally reasonable unless you work on extremely small inputs. In that case, it is better to use Numba in the caller function since the problem is actually the slow CPython interpreter. The lazy compilation of the Numba function make the first execution significantly slower. You can provide the signature to Numba to make it faster (eager compilation).

import numba as nb

@nb.njit('(float64[:,::1], float64[::1])')
def solve_nb(a, b):
    return np.linalg.solve(a, b)

On my machine. It is about 16% faster on a 7x7 matrix. It requires the matrices to be contiguous (working on non-contiguous is fundamentally inefficient, especially here). If this is not fast enough, then you can call dgesv directly for double-precision matrices (or sgesv for simple-precision).

Actually, solve does use dgesv internally. lstsq appears to use a singular value decomposition (SVD). SVD are significantly slower than a QR decomposition which is generally a bit slower than a LU decomposition.

I am not an expert of the numerical/mathematical part, but AFAIK, solving this with a LU decomposition is less numerically stable than using a QR which is also less numerically stable than a SVD. Also, I think a SVD/QR method should be used instead of a simple LU decomposition for matrices that are not full-rank one.

The implementation of dgesv of the standard Netlib LAPACK uses a LU factorization followed by a call to dgetrs (see here). This later call should be fast compared to the LU factorization. The code of LAPACK implementations are generally pretty generic so they may have significant overhead on 7x7 matrices (AFAIK, the Intel implementation is one of the fastest for that).

An alternative solution is to write your own specialized LU decomposition and your own system solving using Numba or Cython. This solution is tedious, but it should be significantly faster since the compiler can unroll the loop if it know the bounds reducing the overheads. You can also perform 1 allocation instead of multiple ones.

Upvotes: 2

Related Questions