Reputation: 113
Conventional wisdom states that if you are solving Ax = b several times with the same A and a different b, you should be using an LU factorization for LU. If I use p, l, u = scipy.linalg.lu(A)
and solve several times in a loop with
x = scipy.linalg.solve(l, p.T@b)
x = scipy.linalg.solve(u, x)
this ends up being much slower than just using
x = scipy.linalg.solve(A,b)
in the loop. Is scipy.linalg.solve()
not optimized to solve upper and lower-diagonal systems using forward and backward substitution? Or, is it possible that there is some compiling trickery where python recognizes that it can do LU factorization for the scipy.linalg.solve
part?
I know there are linalg.lu_factor
and linalg.lu_solve
routines in scipy, but I wanted to stay away from those since this is supposed to be a pedagogical example.
Upvotes: 1
Views: 718
Reputation: 231385
Most of my linear algebra studies were pre-computer (or at least pre MATLAB/python). But I can read the docs.
In [29]: from scipy import linalg as la
Starting with a sample array from la.lu
:
In [30]: A = np.array([[2, 5, 8, 7], [5, 2, 2, 8], [7, 5, 6, 6], [5, 4, 4, 8]])
In [31]: p, l, u = la.lu(A)
In [32]: p
Out[32]:
array([[0., 1., 0., 0.],
[0., 0., 0., 1.],
[1., 0., 0., 0.],
[0., 0., 1., 0.]])
In [33]: l
Out[33]:
array([[ 1. , 0. , 0. , 0. ],
[ 0.28571429, 1. , 0. , 0. ],
[ 0.71428571, 0.12 , 1. , 0. ],
[ 0.71428571, -0.44 , -0.46153846, 1. ]])
In [34]: u
Out[34]:
array([[ 7. , 5. , 6. , 6. ],
[ 0. , 3.57142857, 6.28571429, 5.28571429],
[ 0. , 0. , -1.04 , 3.08 ],
[ 0. , 0. , 0. , 7.46153846]])
In [42]: b=np.arange(4)
In [43]: la.solve(A,b)
Out[43]: array([-0.21649485, 2.54639175, -1.54639175, 0.01030928])
In [44]: timeit la.solve(A,b)
43.5 µs ± 88.5 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)
I see docs for a la.solve_triangular
. After a bit trial and error I got:
In [46]: la.solve_triangular(u,la.solve_triangular(l,p.T@b, lower=True))
Out[46]: array([-0.21649485, 2.54639175, -1.54639175, 0.01030928])
and timing it:
In [47]: timeit la.solve_triangular(u,la.solve_triangular(l,p.T@b, lower=True))
83 µs ± 2.6 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each)
So double use of solve_trianglar
is slower than one solve
, but is faster than using a solve
that doesn't know that its arrays are triangular.
In [48]: la.solve(u,la.solve(l,p.T@b))
Out[48]: array([-0.21649485, 2.54639175, -1.54639175, 0.01030928])
In [49]: timeit la.solve(u,la.solve(l,p.T@b))
137 µs ± 342 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)
I don't know how these calculations will scale.
Testing @Warren's lu_solve
idea (in a deleted answer)
https://stackoverflow.com/a/64473976/901925
In [50]: lu_and_piv = la.lu_factor(A)
In [51]: lu_and_piv
Out[51]:
(array([[ 7. , 5. , 6. , 6. ],
[ 0.28571429, 3.57142857, 6.28571429, 5.28571429],
[ 0.71428571, 0.12 , -1.04 , 3.08 ],
[ 0.71428571, -0.44 , -0.46153846, 7.46153846]]),
array([2, 2, 3, 3], dtype=int32))
In [52]: la.lu_solve(lu_and_piv, b)
Out[52]: array([-0.21649485, 2.54639175, -1.54639175, 0.01030928])
In [53]: timeit la.lu_solve(lu_and_piv, b)
7.47 µs ± 14.3 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)
Upvotes: 3