Reputation: 750
I have sparse CSR matrices (from a product of two sparse vector) and I want to convert each matrix to a flat vector. Indeed, I want to avoid using any dense representation or iterating over indexes.
So far, the only solution that came up was to iterate over non null elements by using coo representation:
import numpy
from scipy import sparse as sp
matrices = [sp.csr_matrix([[1,2],[3,4]])]*3
vectorSize = matrices[0].shape[0]*matrices[0].shape[1]
flatMatrixData = []
flatMatrixRows = []
flatMatrixCols = []
for i in range(len(matrices)):
matrix = matrices[i].tocoo()
flatMatrixData += matrix.data.tolist()
flatMatrixRows += [i]*matrix.nnz
flatMatrixCols += [r+c*2 for r,c in zip(matrix.row, matrix.col)]
flatMatrix = sp.coo_matrix((flatMatrixData,(flatMatrixRows, flatMatrixCols)), shape=(len(matrices), vectorSize), dtype=numpy.float64).tocsr()
It is indeed unsatisfying and inelegant. Does any one know how to achieve this in an efficient way?
Upvotes: 2
Views: 4846
Reputation: 231325
Your flatMatrix is (3,4); each row is [1 3 2 4]. If a submatrix is x
, then the row is x.A.T.flatten()
.
F = sp.vstack([x.T.tolil().reshape((1,vectorSize)) for x in matrices])
F
is the same (dtype is int). I had to convert each submatrix to lil
since csr
has not implemented reshape
(in my version of sparse
). I don't know if other formats work.
Ideally sparse
would let you do the whole range of numpy
array (or matrix) manipulations, but it isn't there yet.
Given the small dimensions in this example, I won't speculate on the speed of the alternatives.
Upvotes: 3