user6170905
user6170905

Reputation:

What is the correct way to add elements to a csr_matrix?

I have a csr_matrix, let's say I called:

import scipy.sparse as ss
mat = ss.csr.csr_matrix((50, 100))

Now I want to modify some of the values on this matrix. I call:

mat[0,1]+=1

And I get:

SparseEfficiencyWarning: Changing the sparsity structure of a csr_matrix is expensive. lil_matrix is more efficient.

I need only to set a few values (at the scale of the matrix at last) just after the creation of the matrix. Later on I will only read the columns or do element-wise operations on the whole matrix (like .log1p())

What would be the correct way to do that ? Currently I can just ignore the warning, but there may be a better way, that don't yield a warning.

Upvotes: 3

Views: 4567

Answers (2)

keithpjolley
keithpjolley

Reputation: 2263

Instead of:

from scipy.sparse import csr_matrix

# Create sparse matrix.
graph = csr_matrix((10, 10))
# Change sparse matrix.
graph[(1, 1)] = 0      # --- SLOW --- ^1
# Do some calculations.
graph += graph

Or:

from scipy.sparse import lil_matrix

# Create sparse matrix.
graph = lil_matrix((10, 10))
# Change sparse matrix.
graph[(1, 1)] = 0
# Do some calculations.
graph += graph         # --- SLOW --- ^2

Combine the strengths of both:

from scipy.sparse import csr_matrix, lil_matrix

# Create sparse matrix.
graph = lil_matrix((10, 10))
# Change sparse matrix.
graph[(1, 1)] = 0
# Done with changes to graph. Convert to csr.
graph = csr_matrix(graph)
# Do some calculations.
graph += graph         

Don't take "--- SLOW ---" as a one-size fits all commandment! It's just a warning that in with some data sets you should be aware that there may be faster, more efficient ways, of doing things. For other data sets this would only make your code harder to read and maintain without any performance benefit.

1: "SLOW" as per warning:

/venv/lib/python3.8/site-packages/scipy/sparse/_index.py:82: SparseEfficiencyWarning: Changing the sparsity structure of a csr_matrix is expensive. lil_matrix is more efficient.

2: "SLOW" as per warning in docs:

Disadvantages of the LIL format:
arithmetic operations LIL + LIL are slow (consider CSR or CSC)

Upvotes: 0

hpaulj
hpaulj

Reputation: 231335

You can control the appearance of warnings. The default is to show them once during a run, and then be silent. You can change that to raise an error, be completely silent, or issue the warning every time.

A common way of creating a sparse matrix is to create the 3 coo style arrays, with all nonzero values. Then make a coo matrix, or csr directly (it takes the same style of input).

coo format doesn't have indexing, so you can't do M[i,j]=1 anyways. But csr does implement it. I think the warning is there to discourage multiple changes (in a loop) not one or two.

Changing the sparsity of a csr matrix requires recalculating the whole set of attributes (data and index pointers). That's why its expensive. I haven't done timings but it may be almost as expensive as making the array fresh.

lil is supposed to be better for incremental assignment. It keeps its data in lists of lists, and inserting values into lists is fast. But converting csr to lil and back takes time, so I wouldn't do it for just a few additions.

Upvotes: 1

Related Questions