Reputation:
I have a csr_matrix, let's say I called:
import scipy.sparse as ss
mat = ss.csr.csr_matrix((50, 100))
Now I want to modify some of the values on this matrix. I call:
mat[0,1]+=1
And I get:
SparseEfficiencyWarning: Changing the sparsity structure of a csr_matrix is expensive. lil_matrix is more efficient.
I need only to set a few values (at the scale of the matrix at last) just after the creation of the matrix. Later on I will only read the columns or do element-wise operations on the whole matrix (like .log1p()
)
What would be the correct way to do that ? Currently I can just ignore the warning, but there may be a better way, that don't yield a warning.
Upvotes: 3
Views: 4567
Reputation: 2263
Instead of:
from scipy.sparse import csr_matrix
# Create sparse matrix.
graph = csr_matrix((10, 10))
# Change sparse matrix.
graph[(1, 1)] = 0 # --- SLOW --- ^1
# Do some calculations.
graph += graph
Or:
from scipy.sparse import lil_matrix
# Create sparse matrix.
graph = lil_matrix((10, 10))
# Change sparse matrix.
graph[(1, 1)] = 0
# Do some calculations.
graph += graph # --- SLOW --- ^2
Combine the strengths of both:
from scipy.sparse import csr_matrix, lil_matrix
# Create sparse matrix.
graph = lil_matrix((10, 10))
# Change sparse matrix.
graph[(1, 1)] = 0
# Done with changes to graph. Convert to csr.
graph = csr_matrix(graph)
# Do some calculations.
graph += graph
Don't take "--- SLOW ---
" as a one-size fits all commandment! It's just a warning that in with some data sets you should be aware that there may be faster, more efficient ways, of doing things. For other data sets this would only make your code harder to read and maintain without any performance benefit.
1: "SLOW" as per warning:
/venv/lib/python3.8/site-packages/scipy/sparse/_index.py:82: SparseEfficiencyWarning: Changing the sparsity structure of a csr_matrix is expensive. lil_matrix is more efficient.
2: "SLOW" as per warning in docs:
Disadvantages of the LIL format:
arithmetic operations LIL + LIL are slow (consider CSR or CSC)
Upvotes: 0
Reputation: 231335
You can control the appearance of warnings. The default is to show them once during a run, and then be silent. You can change that to raise an error, be completely silent, or issue the warning every time.
A common way of creating a sparse matrix is to create the 3 coo
style arrays, with all nonzero values. Then make a coo
matrix, or csr directly (it takes the same style of input).
coo
format doesn't have indexing, so you can't do M[i,j]=1
anyways. But csr
does implement it. I think the warning is there to discourage multiple changes (in a loop) not one or two.
Changing the sparsity of a csr
matrix requires recalculating the whole set of attributes (data and index pointers). That's why its expensive. I haven't done timings but it may be almost as expensive as making the array fresh.
lil
is supposed to be better for incremental assignment. It keeps its data in lists of lists, and inserting values into lists is fast. But converting csr
to lil
and back takes time, so I wouldn't do it for just a few additions.
Upvotes: 1