Reputation: 597
looking for Algo here & trying to implement this code, I'm getting different l2-norms
for resulting vectors of params for linear equation. Where am I mistaken in my attempt to adopt the code?
import numpy as np
from scipy import linalg
np.random.seed(123)
v = np.random.rand(4)
A = v[:,None] * v[None,:]
b = np.random.randn(4)
x = linalg.inv(A.T.dot(A)).dot(A.T).dot(b) #Usually not recommended because of Numerical Instability of the Normal Equations https://johnwlambert.github.io/least-squares/
l2_0= linalg.norm(A.dot(x) - b)
print("manually: ", l2_0)
x = linalg.lstsq(A, b)[0]
l2_1= linalg.norm(A.dot(x) - b)
print("scipy.linalg.lstsq: ", l2_1)
# 2-norm of two calculations compared
print(np.allclose(l2_0, l2_1, rtol=1.3e-1))
def direct_ls_svd(x,y):
# append a columns of 1s (these are the biases)
x = np.column_stack([np.ones(x.shape[0]), x])
# calculate the economy SVD for the data matrix x
U,S,Vt = linalg.svd(x, full_matrices=False)
# solve Ax = b for the best possible approximate solution in terms of least squares
x_hat = Vt.T @ linalg.inv(np.diag(S)) @ U.T @ y
#print(x_hat)
# perform train and test inference
#y_pred = x @ x_hat
return y-x @ x_hat #x_hat
x= direct_ls_svd(A, b)
l2_svd= linalg.norm(A.dot(x) - b)
print("svd: ", l2_svd)
# LU
x= linalg.solve(A.T@A, A.T@b)
l2_solve= linalg.norm(A.dot(x) - b)
print("scipy.linalg.solve: ", l2_solve)
# manually: 2.9751344995811313
# scipy.linalg.lstsq: 2.9286130558050654
# True
# svd: 6.830550019041984
# scipy.linalg.solve: 2.928613055805065
if my error is in SVD-decomposition Algorithm imlementation for solving Least-Squares problem or perhaps in Numpy relative Scipy rounding or precision differences ? How to correct svd-algo for Least-Squares to become compareable with scipy's? And will this algorithm be faster & less memory-consuming than Iterative Least-Squares methods?
P.S. SVD applications or here, SVD for PCA & PLS-SVD is my final goal -- will be the algo the same as for Least-Squares approximation ? I'm confused with such a question in general (even with code examples). Can somebody add some clarity for newbie like me, please?
P.P.S.
applying such implementation - I'm getting even worse result: svd: 10.031259300735462
for l2-norm
P.P.P.S. also I lack some understanding of svd in singular spectral decomposition if exists ref, as 1st for unsupervised dim.reduction & 2nd for non-parametric TS analisys, for practice
P.P.P.P.S. ! PCA is used preliminary estimating if Multicollinearity exists, otherwise strange results can get (biased etc.)... (if no collinearity => no sensitivity to error of estimation, aka shown in small condition number of OLS analysis, - vc.vs huge cond.num.==collinearity for multivariable/multidimensional regression)
Upvotes: 0
Views: 601
Reputation: 14654
The most important part here is to filter out the singular vallues that are 0
. For your example data S
is [9.22354602e-01 3.92705914e-17 1.10667017e-17 5.55744006e-18]
, notice that you have one singular value close of ~9.22
and the other three are tiny (< 1e-16
).
If you attempt to reconstruct the solution using the small errors some elements of Vt
and U
, that should be about the same magnitude will get divided by these small values and will add up to a significant error to the result.
What can be done in this case is, you assume that any singular value that are small enough are an exact zero. In the following modified version of your function I am assuming all the singular values that are less than rcond
times the maximum singular value should be zero. Then I compute a mask m
and drop the corresponding rows and columns of the U
and Vt
matrices.
def direct_ls_svd(A,b,rcond=1e-7):
# calculate the economy SVD for the data matrix x
U,S,Vt = linalg.svd(A, full_matrices=False)
# Mask to remove the zeroes
m = (abs(S) / np.max(abs(S))) > rcond
U, S, Vt = U[:,m], S[m], Vt[m, :]
assert np.allclose( U @ np.diag(S) @ Vt, A)
# solve Ax = b for the best possible approximate solution
# in terms of least squares
x_hat = Vt.T @ ((U.T @ b) / S)
return x_hat
An alternative solution is to set S[m]=0
then you could avoid an extra copy in the worst case, but you lose the potential savings from the reduction of the number of multiplications in the very low rank cases.
Upvotes: 1