Reputation: 13108
I have a scipy sparse matrix with N values that are nonzero, which I would like to get returned as a numpy array with the shape (N,3), where the first columns contain the indices of the nonzero values and the last column contains the respective nonzero value.
Example:
I would like
mymatrix.toarray()
matrix([[0. , 0. , 0. , 0. , 0. , 0. , 0. , 0. , 0. , 0. , 0. , 0. , 0. ],
[0. , 0. , 0. , 0. , 0. , 0. , 0. , 0. , 0. , 0. , 0. , 0. , 0. ],
[0. , 0. , 0. , 0.83885831, 0. , 0. , 0. , 0. , 0. , 0. , 0. , 0. , 0. ],
[0. , 0. , 0. , 0. , 0. , 1.13395003, 0. , 0. , 0. , 0. , 0. , 0. , 0. ],
[0. , 0. , 0. , 0. , 0. , 0. , 0. , 0. , 0. , 0. , 0. , 0. , 0. ],
[0. , 0. , 0. , 0. , 0. , 0. , 0. , 0. , 0. , 0. , 0. , 0. , 0. ],
[0. , 0. , 0. , 0. , 0. , 0.57979727, 0. , 0. , 0. , 0. , 0. , 0. , 0. ],
[0. , 0. , 0. , 0. , 0.75500017, 0. , 0.81459546, 0. , 0. , 0. , 0. , 0. , 0. ],
[0. , 0. , 0. , 0. , 0. , 0. , 0. , 0. , 0. , 0.87997548, 0. , 0. , 0. ],
[0. , 0. , 0. , 0. , 0. , 0. , 0. , 0. , 0. , 0. , 0. , 0. , 0. ],
[0. , 0. , 0. , 0. , 0. , 0. , 0. , 0. , 0. , 0. , 0. , 0. , 0. ],
[0. , 0. , 0. , 0. , 0. , 0. , 0. , 0. , 0. , 0. , 0. , 0. , 0. ],
[0. , 0. , 0. , 0. , 0. , 0. , 0. , 0. , 0. , 0. , 0. , 0. , 0. ]])
to become
np.array([[3, 2, 0.83885831], [4,5,1.13395003], [6,5,0.57979727], [7,4,0.75500017], [7,6,0.81459546], [8,9,0.87997548]])
array([[3. , 2. , 0.83885831],
[4. , 5. , 1.13395003],
[6. , 5. , 0.57979727],
[7. , 4. , 0.75500017],
[7. , 6. , 0.81459546],
[8. , 9. , 0.87997548]])
How do I do this efficiently?
After the transformation I am going to iterate over the rows - so if there is an efficient option to iterate through the rows without the transformation, I that would be appreciated, too:
for index_i, index_j, value in mymatrix.iterator():
do_something(index_i, index_j, value)
Upvotes: 1
Views: 234
Reputation: 53089
For the iteration, dok (dictionary of keys) format looks like a natural match; you can do:
for (i,j), v in your_sparse_matrix.todok().items():
etc.
The Nx3 list of coordinate-value records can be easily obtained from coo format:
coo = your_sparse_matrix.tocoo()
np.column_stack((coo.row,coo.col,coo.data))
Obviously, this too can be used for iteration; you'll have to test which is faster in your use case.
Upvotes: 2