Simone Bolognini
Simone Bolognini

Reputation: 524

Routine to extract linear independent rows from a rank deficient matrix

I'm struggling with the following problem: I have some very big matrices (say, at least, 2000x2000, and probably in the future they will even reach 10000x10000) with very small rank (2 or 3, call it N) and I need to find an efficient Python routine to extract the linear independent rows (or columns, the matrix is symmetric!) From them. I tried to take the first N columns of the Q matrix of QR decomposition but it seems not to work correctly (is this wrong maybe?).

Here's the Python code I use to implement the method suggested by Ami Tavory:

from numpy import absolute
from numpy.linalg import qr

q = qr(R)[1] #R is my matrix
q = absolute(q)
sums = sum(q,axis=1)

i = 0
while( i < dim ): #dim is the matrix dimension
    if(sums[i] > 1.e-10):
       print "%d is a good index!" % i
    i += 1

This should tell me if the row is non-zero and therefore if the I-th column of R is linearly independent.

Upvotes: 8

Views: 8214

Answers (3)

suntzuisafterU
suntzuisafterU

Reputation: 5

Here is a minimal example demonstrating Ami's suggestion of using the QR decomp and taking non-null rows of R doesn't work as described:

# 2 dependent rows
>> A = array([[ 1.,  1.,  0.,  0.],
              [-1., -1.,  0.,  0.]])
# looks like it works for this case
>> numpy.linalg.qr(A.T)[1]
array([[-1.41421356,  1.41421356], 
       [ 0.        ,  0.        ]])

# counter example
>> A2 = numpy.array([[1,1,0,0.],[-1,-1,0,0.],[1,2,3,4]])
>> numpy.linalg.qr(A2.T)[1] 
array([[-1.41421356,  1.41421356, -2.12132034],
       [ 0.        ,  0.        ,  0.70710678], # 2nd row is not null
       [ 0.        ,  0.        , -5.        ]]) 

The second pivot is missing, it's possible there is some structure here that could be used but not the rows.

Upvotes: 0

Simone Bolognini
Simone Bolognini

Reputation: 524

Here's the link where I found a solution to my problem! The Cauchy-Schwartz method proposed here works just fine!!

Upvotes: 0

Ami Tavory
Ami Tavory

Reputation: 76366

The Gram Schmidt process finds a basis (equivalently largest independent subset) using linear combinations, and the QR Decomposition effectively mimics this.

Therefore, one way to do what you want is to apply numpy.linalg.qr to the transpose, and check the non-zero components of the R matrix. The corresponding columns (in the transpose matrix, i.e., the rows in your original matrix) are independent.


Edit After some searching, I believe this Berkeley lecture explains it, but here are examples

import numpy as np

# 2nd column is redundant
a = np.array([[1, 0, 0], [0, 0, 0], [1, 0, 1]])
>> np.linalg.qr(a)[1] # 2nd row empty
array([[ 1.41421356,  0.        ,  0.70710678],
   [ 0.        ,  0.        ,  0.        ],
   [ 0.        ,  0.        ,  0.70710678]])

# 3rd column is redundant
a = np.array([[1, 0, 0], [1, 0, 1], [0, 0, 0], ])
>> np.linalg.qr(a)[1] # 3rd row empty
array([[ 1.41421356,  0.        ,  0.70710678],
   [ 0.        ,  0.        , -0.70710678],
   [ 0.        ,  0.        ,  0.        ]])

# No column redundant
a = np.array([[1, 0, 0], [1, 0, 1], [2, 3, 4], ])
>> np.linalg.qr(a)[1] # No row empty
array([[ 2.44948974,  2.44948974,  3.67423461],
   [ 0.        ,  1.73205081,  1.73205081],
   [ 0.        ,  0.        ,  0.70710678]])

Upvotes: 10

Related Questions