Reputation: 16528
I have a sparse matrix that I am constructing using several diagonals:
A = diags([np.arange(100), np.arange(99), np.arange(99)], offsets=[0, -1, 1])
However, this sparse matrix also has a vector of ones in the last row. Is there any way to store this in the sparse matrix or is my construct to inefficient and I should work with a dense matrix?
Upvotes: 0
Views: 998
Reputation: 231738
sparse.diags
does make a sparse matrix with a special diagonal
format:
In [591]: A = sparse.diags([np.arange(100), np.arange(99), np.arange(99)], offse
...: ts=[0, -1, 1])
In [592]: A
Out[592]:
<100x100 sparse matrix of type '<class 'numpy.float64'>'
with 298 stored elements (3 diagonals) in DIAgonal format>
In [593]: A.A
Out[593]:
array([[ 0., 0., 0., ..., 0., 0., 0.],
[ 0., 1., 1., ..., 0., 0., 0.],
[ 0., 1., 2., ..., 0., 0., 0.],
...,
[ 0., 0., 0., ..., 97., 97., 0.],
[ 0., 0., 0., ..., 97., 98., 98.],
[ 0., 0., 0., ..., 0., 98., 99.]])
But the storage isn't significantly more efficient than other sparse formats. Other formats have to store the same 298 values. They'll just index them differently.
We can set the last row in various ways.
We can't index the last row directly with the sparse format.
In [594]: A[-1,:]
...
TypeError: 'dia_matrix' object is not subscriptable
but we can convert it to csr
format, and set its row value:
In [595]: A.tocsr()[-1,:]
Out[595]:
<1x100 sparse matrix of type '<class 'numpy.float64'>'
with 2 stored elements in Compressed Sparse Row format>
In [596]: Ac = A.tocsr()
In [597]: Ac[-1,:]=1
/usr/local/lib/python3.5/dist-packages/scipy/sparse/compressed.py:730: SparseEfficiencyWarning: Changing the sparsity structure of a csr_matrix is expensive.
In [598]: Ac
Out[598]:
<100x100 sparse matrix of type '<class 'numpy.float64'>'
with 393 stored elements in Compressed Sparse Row format>
In [599]: Ac.A
Out[599]:
array([[ 0., 0., 0., ..., 0., 0., 0.],
[ 0., 1., 1., ..., 0., 0., 0.],
[ 0., 1., 2., ..., 0., 0., 0.],
...,
[ 0., 0., 0., ..., 97., 97., 0.],
[ 0., 0., 0., ..., 97., 98., 98.],
[ 1., 1., 1., ..., 1., 1., 1.]])
Here I wouldn't worry about the sparsity warning; that's meant more for cases where the action is done iteratively. I could have used tolil()
instead. Keep in mind that the csr
format gets use for calculations. And the coo
format is use when combining blocks of matrices.
I just checked the sparse.dia_matrix
code. For your array A.data
is a (3,100) array. It 'squared off' your ragged input. A.offsets
is a 3 element array.
A.tocoo()
stores the values in 3 (295,) arrays (removing the 3 0's in your definition). A A.tocsr()
stores 2 (295,) arrays plus a (101,) indptr
array. So the dia
format is more compact, but only so long as you can work in the format.
To append the ones row instead, use sparse.vstack
(vstack
uses coo
format to construct the new matrix):
In [619]: B = sparse.vstack((A,np.ones((1,100))))
In [620]: B
Out[620]:
<101x100 sparse matrix of type '<class 'numpy.float64'>'
with 395 stored elements in COOrdinate format>
Out of curiosity I tried vstack
with a dia
output - it doesn't like it, because the squared off dia
data would be too large.
In [621]: B = sparse.vstack((A,np.ones((1,100))),format='dia')
/usr/local/lib/python3.5/dist-packages/scipy/sparse/coo.py:359: SparseEfficiencyWarning: Constructing a DIA matrix with 102 diagonals is inefficient
Assignment with the lil
format doesn't produce any warnings:
In [624]: Al = A.tolil()
In [625]: Al[-1,:]=1
In [626]: Al
Out[626]:
<100x100 sparse matrix of type '<class 'numpy.float64'>'
with 393 stored elements in LInked List format>
This too is converted to csr
for most calculations.
Upvotes: 2