Reputation: 235
I have a large sparse matrix in the form of a scipy coo_matrix
(size of 5GB). I have to make use of the non-zero entries of the matrix and do some further processing.
What would be the best way to access the elements of the matrix? Should I convert the matrix to other formats or use it as it is? Also, could you please tell me the exact syntax for accessing an element of a coo_matrix
? I got a bit confused since it doesn't allow slicing.
Upvotes: 0
Views: 1254
Reputation: 74154
First let's build a random COO matrix:
import numpy as np
from scipy import sparse
x = sparse.rand(10000, 10000, format='coo')
The non-zero values are found in the .data
attribute of the matrix, and you can get their corresponding row/column indices using x.nonzero()
:
v = x.data
r, c = x.nonzero()
print np.all(x.todense()[r, c] == v)
# True
With a COO matrix it's possible to index a single row or column (as a sparse vector) using the getrow()
/getcol()
methods. If you want to do slicing or fancy indexing of particular elements then you need to convert it to another format such as lil_matrix
, for example using the .tolil()
method.
You should really read the scipy.sparse
docs for more information about the features of the different sparse array formats - the appropriate choice of format really depends on what you plan on doing with your array.
Upvotes: 1