Reputation: 27
I am trying to solve a linear system spanning somewhat between hundred thousand and two hundred thousand equations with numpy.linalg.lstsq but is taking waaaaay too long. What can I do to speed this up?
The matrix is sparse with hundreds of columns (the dimensions are approximately 150 000 x 140) and the system is overdetermined.
Upvotes: 0
Views: 3192
Reputation: 761
If your coefficient matrix is sparse, use "spsolve" from "scipy.sparse.linalg".
Upvotes: 0
Reputation: 53089
Here is a bit of improvised trickery that speeds up the computation quite a bit on random data of the indicated dimensions.
I've no idea how numerically sound this is, though.
import numpy as np
from time import perf_counter
def lstsq(A, b):
AA = A.T @ A
bA = b @ A
D, U = np.linalg.eigh(AA)
Ap = (U * np.sqrt(D)).T
bp = bA @ U / np.sqrt(D)
return np.linalg.lstsq(Ap, bp, rcond=None)
# create random data
A = np.random.random((150_000, 140))
b = np.random.random((150_000,))
# use solver directly
t = perf_counter()
x, *info = np.linalg.lstsq(A, b, rcond=None)
s = perf_counter()
print('direct method: ', s-t, 'seconds')
# use equivalent reduced system
t = perf_counter()
x_acc, *info_acc = lstsq(A, b)
s = perf_counter()
print('accelerated method:', s-t, 'seconds')
print('results equal:', np.allclose(x, x_acc))
Sample run:
direct method: 3.032766239999546 seconds
accelerated method: 0.20947745100056636 seconds
results equal: True
Upvotes: 1