Reputation: 165
I'm trying to decomposing signals in components (matrix factorization) in a large sparse matrix in Python using the sklearn
library.
I made use of scipy's scipy.sparse.csc_matrix
to construct my matrix of data. However I'm unable to perform any analysis such as factor analysis or independent component analysis. The only thing I'm able to do is use truncatedSVD
or scipy's scipy.sparse.linalg.svds
and perform PCA.
Does anyone know any work-arounds to doing ICA or FA on a sparse matrix in python? Any help would be much appreciated! Thanks.
Upvotes: 2
Views: 1397
Reputation: 1466
Given:
M = UΣV^t
The drawback with SVD is that the matrix U and V^t are dense matrices. It doesn't really matter that the input matrix is sparse, U and T will be dense. Also the computational complexity of SVD is O(n^2*m) or O(m^2*n) where n is the number of rows and m the number of columns in the input matrix M. It depends on which one is biggest.
It is worth mentioning that SVD will give you the optimal solution and if you can live with a smaller loss, calculated by the frobenius norm, you might want to consider using the CUR algorithm. It will scale to larger datasets with O(n*m).
U = CUR^t
Where C and R are now SPARSE matrices.
If you want to look at a python implementation, take a look at pymf. But be a bit careful about that exact implementations since it seems, at this point in time, there is an open issue with the implementation.
Upvotes: 1
Reputation: 510
It is usually a best practice to use coo_matrix
to establish the matrix and then convert it using .tocsc()
to manipulate it.
Upvotes: 0
Reputation: 3608
Even the input matrix is sparse the output will not be a sparse matrix. If the system does not support a dense matrix neither the results will not be supported
Upvotes: 0