Reputation: 1554
I have a huge, sparse matrix in the type of scipy.sparse.csr.csr_matrix
that I need to estimate its rank. I find this on scipy.org that seems perfect for this job, but it doesn't support csr_matrix
.
from scipy.sparse import load_npz
from scipy.linalg.interpolative import estimate_rank
X = load_npz("https://drive.google.com/uc?export=download&id=1SSR6JWEqG4DXRU9qo78682D9pGJF3Wr0")
print("Rank:", estimate_rank(X, eps=100))
TypeError: invalid input type (must be array or LinearOperator)
The sparse matrix has over 50K rows and nearly 40K columns. Converting it to a numpy array first seems pointless. What should I do to make it work?
The following doesn't work either.
from scipy.sparse import load_npz, linalg
from scipy.linalg.interpolative import estimate_rank
X = load_npz("https://drive.google.com/uc?export=download&id=1SSR6JWEqG4DXRU9qo78682D9pGJF3Wr0")
print("Rank:", estimate_rank(linag.aslinearoperator(X), eps=100))
ValueError Traceback (most recent call last) in () 3 4 print(type(X)) ----> 5 print("Rank of the Document-Term Matrix:", estimate_rank(aslinearoperator(X), eps=1))
1 frames /usr/local/lib/python3.6/dist-packages/scipy/linalg/_interpolative_backend.py in idd_findrank(eps, m, n, matvect) 659 :rtype: int 660 """ --> 661 k, ra, ier = _id.idd_findrank(eps, m, n, matvect) 662 if ier: 663 raise _RETCODE_ERROR
ValueError: failed to create intent(cache|hide)|optional array-- must have defined dimensions but got (-1216667648,)
Upvotes: 0
Views: 480
Reputation: 231335
I have used sparse, but haven't used estimate_rank
. But I can read errors and docs.
In [23]: from scipy import sparse
In [24]: from scipy.sparse import linalg
In [25]: M = sparse.random(100,100,.2, 'csr')
In [36]: inter.estimate_rank(M,.001)
---------------------------------------------------------------------------
...
TypeError: invalid input type (must be array or LinearOperator)
testing the array option:
In [37]: inter.estimate_rank(M.A,.1)
Out[37]: 100
testing the linearoperator option:
In [38]: from scipy.sparse import linalg
In [39]: L = linalg.aslinearoperator(M)
In [40]: L
Out[40]: <100x100 MatrixLinearOperator with dtype=float64>
In [41]: inter.estimate_rank(L,.001)
Out[41]: 99
Upvotes: 1