Make42
Make42

Reputation: 13108

Transform scipy sparse matrix to index-based numpy array

I have a scipy sparse matrix with N values that are nonzero, which I would like to get returned as a numpy array with the shape (N,3), where the first columns contain the indices of the nonzero values and the last column contains the respective nonzero value.

Example:

I would like

mymatrix.toarray()
matrix([[0.        , 0.        , 0.        , 0.        , 0.        , 0.        , 0.        , 0.        , 0.        , 0.        , 0.        , 0.        , 0.        ],
        [0.        , 0.        , 0.        , 0.        , 0.        , 0.        , 0.        , 0.        , 0.        , 0.        , 0.        , 0.        , 0.        ],
        [0.        , 0.        , 0.        , 0.83885831, 0.        , 0.        , 0.        , 0.        , 0.        , 0.        , 0.        , 0.        , 0.        ],
        [0.        , 0.        , 0.        , 0.        , 0.        , 1.13395003, 0.        , 0.        , 0.        , 0.        , 0.        , 0.        , 0.        ],
        [0.        , 0.        , 0.        , 0.        , 0.        , 0.        , 0.        , 0.        , 0.        , 0.        , 0.        , 0.        , 0.        ],
        [0.        , 0.        , 0.        , 0.        , 0.        , 0.        , 0.        , 0.        , 0.        , 0.        , 0.        , 0.        , 0.        ],
        [0.        , 0.        , 0.        , 0.        , 0.        , 0.57979727, 0.        , 0.        , 0.        , 0.        , 0.        , 0.        , 0.        ],
        [0.        , 0.        , 0.        , 0.        , 0.75500017, 0.        , 0.81459546, 0.        , 0.        , 0.        , 0.        , 0.        , 0.        ],
        [0.        , 0.        , 0.        , 0.        , 0.        , 0.        , 0.        , 0.        , 0.        , 0.87997548, 0.        , 0.        , 0.        ],
        [0.        , 0.        , 0.        , 0.        , 0.        , 0.        , 0.        , 0.        , 0.        , 0.        , 0.        , 0.        , 0.        ],
        [0.        , 0.        , 0.        , 0.        , 0.        , 0.        , 0.        , 0.        , 0.        , 0.        , 0.        , 0.        , 0.        ],
        [0.        , 0.        , 0.        , 0.        , 0.        , 0.        , 0.        , 0.        , 0.        , 0.        , 0.        , 0.        , 0.        ],
        [0.        , 0.        , 0.        , 0.        , 0.        , 0.        , 0.        , 0.        , 0.        , 0.        , 0.        , 0.        , 0.        ]])

to become

np.array([[3, 2, 0.83885831], [4,5,1.13395003], [6,5,0.57979727], [7,4,0.75500017], [7,6,0.81459546], [8,9,0.87997548]])

array([[3.        , 2.        , 0.83885831],
       [4.        , 5.        , 1.13395003],
       [6.        , 5.        , 0.57979727],
       [7.        , 4.        , 0.75500017],
       [7.        , 6.        , 0.81459546],
       [8.        , 9.        , 0.87997548]])

How do I do this efficiently?

After the transformation I am going to iterate over the rows - so if there is an efficient option to iterate through the rows without the transformation, I that would be appreciated, too:

for index_i, index_j, value in mymatrix.iterator():
     do_something(index_i, index_j, value)

Upvotes: 1

Views: 234

Answers (1)

Paul Panzer
Paul Panzer

Reputation: 53089

For the iteration, dok (dictionary of keys) format looks like a natural match; you can do:

for (i,j), v in your_sparse_matrix.todok().items():
    etc.

The Nx3 list of coordinate-value records can be easily obtained from coo format:

 coo = your_sparse_matrix.tocoo()
 np.column_stack((coo.row,coo.col,coo.data))

Obviously, this too can be used for iteration; you'll have to test which is faster in your use case.

Upvotes: 2

Related Questions