Twinhelix
Twinhelix

Reputation: 165

Performing Decomposition on Sparse Matrices in Python

I'm trying to decomposing signals in components (matrix factorization) in a large sparse matrix in Python using the sklearn library.

I made use of scipy's scipy.sparse.csc_matrix to construct my matrix of data. However I'm unable to perform any analysis such as factor analysis or independent component analysis. The only thing I'm able to do is use truncatedSVD or scipy's scipy.sparse.linalg.svds and perform PCA.

Does anyone know any work-arounds to doing ICA or FA on a sparse matrix in python? Any help would be much appreciated! Thanks.

Upvotes: 2

Views: 1397

Answers (3)

Victor Axelsson
Victor Axelsson

Reputation: 1466

Given:

M = UΣV^t

The drawback with SVD is that the matrix U and V^t are dense matrices. It doesn't really matter that the input matrix is sparse, U and T will be dense. Also the computational complexity of SVD is O(n^2*m) or O(m^2*n) where n is the number of rows and m the number of columns in the input matrix M. It depends on which one is biggest.

It is worth mentioning that SVD will give you the optimal solution and if you can live with a smaller loss, calculated by the frobenius norm, you might want to consider using the CUR algorithm. It will scale to larger datasets with O(n*m).

U = CUR^t

Where C and R are now SPARSE matrices.

If you want to look at a python implementation, take a look at pymf. But be a bit careful about that exact implementations since it seems, at this point in time, there is an open issue with the implementation.

Upvotes: 1

ShacharSh
ShacharSh

Reputation: 510

It is usually a best practice to use coo_matrix to establish the matrix and then convert it using .tocsc() to manipulate it.

Upvotes: 0

valentin
valentin

Reputation: 3608

Even the input matrix is sparse the output will not be a sparse matrix. If the system does not support a dense matrix neither the results will not be supported

Upvotes: 0

Related Questions