why do I get warning on scipy sparse column slicing?

Question

Scipy sparse documentation of csr_matrix says that this kind of matrix is efficient for row slicing. Using this code:

import numpy as np
from scipy import sparse

dok = sparse.dok_matrix((5,1))
dok[1,0] = 1

data = np.array([0,1,2,3,4])
row = np.array([0,1,2,3,4])
col = np.array([0,1,2,3,4])
csr = sparse.csr_matrix((data, (row, col)))
csr[:, 0] += dok

I get this warning:

SparseEfficiencyWarning: Changing the sparsity structure of a csr_matrix is expensive. lil_matrix is more efficient.

Why am I getting this warning?

Paul Panzer · Accepted Answer

This is unrelated to row vs. column. Essentially, you are forcing scipy to insert elements in the middle of two arrays, which as the warning says is expensive.

Let's look at the internal representation of csr before and after the in-place modification to confirm this:

>>> csr.data
array([0, 1, 2, 3, 4], dtype=int64)
>>> csr.indices
array([0, 1, 2, 3, 4], dtype=int32)
>>> 
>>> csr[:, 0] += dok
/home/paul/lib/python3.6/site-packages/scipy/sparse/compressed.py:742: SparseEfficiencyWarning: Changing the sparsity structure of a csr_matrix is expensive. lil_matrix is more efficient.
  SparseEfficiencyWarning)
>>> csr.data
array([0, 1, 1, 2, 3, 4], dtype=int64)
>>> csr.indices
array([0, 0, 1, 2, 3, 4], dtype=int32)

A bit of background: The compressed sparse row and column formats essentially only store nonzeros. They do this in a packed way using vectors to store the nonzero values and their coordinates in a specific order. If an operation adds new nonzeros they typically can't be appended but must be inserted, which is what we see in the example and what makes it expensive.

why do I get warning on scipy sparse column slicing?

Answers (1)

Related Questions