Scipy Handling of Large COO matrix

Question

I have a large sparse matrix in the form of a scipy coo_matrix (size of 5GB). I have to make use of the non-zero entries of the matrix and do some further processing.

What would be the best way to access the elements of the matrix? Should I convert the matrix to other formats or use it as it is? Also, could you please tell me the exact syntax for accessing an element of a coo_matrix? I got a bit confused since it doesn't allow slicing.

ali_m · Accepted Answer

First let's build a random COO matrix:

import numpy as np
from scipy import sparse

x = sparse.rand(10000, 10000, format='coo')

The non-zero values are found in the .data attribute of the matrix, and you can get their corresponding row/column indices using x.nonzero():

v = x.data
r, c = x.nonzero()

print np.all(x.todense()[r, c] == v)
# True

With a COO matrix it's possible to index a single row or column (as a sparse vector) using the getrow()/getcol() methods. If you want to do slicing or fancy indexing of particular elements then you need to convert it to another format such as lil_matrix, for example using the .tolil() method.

You should really read the scipy.sparse docs for more information about the features of the different sparse array formats - the appropriate choice of format really depends on what you plan on doing with your array.

Scipy Handling of Large COO matrix

Answers (1)

Related Questions