Reputation: 527
I am using the numpy linalg routine lstsq to solve system of equations. My A matrix is size of (11046, 504) while my B matrix is size (11046, 1), and the rank determined is 249, so about half of the solved for x array isn't particularly useful. I'd like to use the s array of singular values to zero out the solved for parameters that correspond to the singular values, but it seems that the s array is sorted in order of decreasing statistical significance. Is there a way I can figure out which of my x's correspond to each singular value s?
Upvotes: 5
Views: 2152
Reputation: 1862
To obtain the least-squares solution to the equation Mb = x
as given by numpy.linalg.lstsq
, you can also use numpy.linalg.svd
, which calculates the singular-value decomposition M= U S V*
. The best solution x
is then given as x = V Sp U* b
, where Sp
is the pseudo-inverse of S
. Given the matrices U
and V*
(containing the left- and right-singular vectors of your matrix M
) and the singular values s
, you can calculate the vector z=V*x
. Now, all components z_i
of z
with i > rank(M)
can be chosen arbitarily without altering the solution, as can thus all components x_j
which are not contained in z_i
for i <= rank(M)
.
Here's an example that demonstrates how to obtain the significant components of x
, using sample data from the Wikipedia entry on singluar value decomposition:
import numpy as np
M = np.array([[1,0,0,0,2],[0,0,3,0,0],[0,0,0,0,0],[0,4,0,0,0]])
#We perform singular-value decomposition of M
U, s, V = np.linalg.svd(M)
S = np.zeros(M.shape,dtype = np.float64)
b = np.array([1,2,3,4])
m = min(M.shape)
#We generate the matrix S (Sigma) from the singular values s
S[:m,:m] = np.diag(s)
#We calculate the pseudo-inverse of S
Sp = S.copy()
for m in range(0,m):
Sp[m,m] = 1.0/Sp[m,m] if Sp[m,m] != 0 else 0
Sp = np.transpose(Sp)
Us = np.matrix(U).getH()
Vs = np.matrix(V).getH()
print "U:\n",U
print "V:\n",V
print "S:\n",S
print "U*:\n",Us
print "V*:\n",Vs
print "Sp:\n",Sp
#We obtain the solution to M*x = b using the singular-value decomposition of the matrix
print "numpy.linalg.svd solution:",np.dot(np.dot(np.dot(Vs,Sp),Us),b)
#This will print:
#numpy.linalg.svd solution: [[ 0.2 1. 0.66666667 0. 0.4 ]]
#We compare the solution to np.linalg.lstsq
x,residuals,rank,s = np.linalg.lstsq(M,b)
print "numpy.linalg.lstsq solution:",x
#This will print:
#numpy.linalg.lstsq solution: [ 0.2 1. 0.66666667 0. 0.4 ]
#We determine the significant (i.e. non-arbitrary) components of x
Vs_significant = Vs[np.nonzero(s)]
print "Significant variables:",np.nonzero(np.sum(np.abs(Vs_significant),axis = 0))[1]
#This will print:
#Significant variables: [[0 1 2 4]]
#(i.e. x_3 can be chosen arbitrarily without altering the result)
Upvotes: 4