Reputation: 21
I have a scipy sparse matrix data
and an integer n
which corropsonds to a row in data
which I want to delete. To delete this row I tried this:
data = sparse.csr_matrix(np.delete(np.array(data),n, axis=0))
However, this produced this error:
Traceback (most recent call last):
File "...", line 260, in <module>
X_labeled = sparse.csr_matrix(np.delete(np.array(X_labeled),n, axis=0))
File "/anaconda3/lib/python3.6/site-packages/scipy/sparse/compressed.py", line 79, in __init__
self._set_self(self.__class__(coo_matrix(arg1, dtype=dtype)))
File "/anaconda3/lib/python3.6/site-packages/scipy/sparse/coo.py", line 177, in __init__
self.row, self.col = M.nonzero()
SystemError: <built-in method nonzero of numpy.ndarray object at 0x113c883f0> returned a result with an error set
When I run:
data = np.delete(data.toarray(),n, axis=0)
I get this error:
Traceback (most recent call last):
File "...", line 261, in <module>
X_labeled = np.delete(X_labeled.toarray(),n, axis=0)
File "/anaconda3/lib/python3.6/site-packages/numpy/lib/function_base.py", line 4839, in delete
"size %i" % (obj, axis, N))
IndexError: index 86 is out of bounds for axis 0 with size 4
When I run this:
print(type(data))
print(data.shape)
print(data.toarray().shape)
I get this:
<class 'scipy.sparse.csr.csr_matrix'>
(4, 2740)
(4, 2740)
Upvotes: 3
Views: 2477
Reputation: 231395
The correct way to turn a sparse matrix into a dense one is with toarray
, not np.array(...)
:
In [408]: M = sparse.csr_matrix(np.eye(3))
In [409]: M
Out[409]:
<3x3 sparse matrix of type '<class 'numpy.float64'>'
with 3 stored elements in Compressed Sparse Row format>
In [410]: np.array(M)
Out[410]:
array(<3x3 sparse matrix of type '<class 'numpy.float64'>'
with 3 stored elements in Compressed Sparse Row format>, dtype=object)
This is a single element object dtype array that contains the sparse matrix - unchanged.
In [411]: M.toarray()
Out[411]:
array([[ 1., 0., 0.],
[ 0., 1., 0.],
[ 0., 0., 1.]])
delete
works with this correct array:
In [414]: data = sparse.csr_matrix(np.delete(M.toarray(),1, axis=0))
In [415]: data
Out[415]:
<2x3 sparse matrix of type '<class 'numpy.float64'>'
with 2 stored elements in Compressed Sparse Row format>
In [416]: data.A
Out[416]:
array([[ 1., 0., 0.],
[ 0., 0., 1.]])
Indexing will do the same thing:
In [417]: M[[0,2],:]
Out[417]:
<2x3 sparse matrix of type '<class 'numpy.float64'>'
with 2 stored elements in Compressed Sparse Row format>
In [418]: _.A
Out[418]:
array([[ 1., 0., 0.],
[ 0., 0., 1.]])
In [420]: M[np.array([True,False,True]),:].A
Out[420]:
array([[ 1., 0., 0.],
[ 0., 0., 1.]])
I would guess that the indexing route is faster, but we'd have to do time tests on realistic size arrays to be sure.
Internally delete
is rather complex, but for some inputs it does something like this - constructing a boolean array with False
for the rows you want to delete.
making the boolean mask:
In [421]: mask=np.ones((3,),bool)
In [422]: mask[1]=False
In [423]: M[mask,:].A
Upvotes: 4