jdoe
jdoe

Reputation: 21

Delete row from scipy matrix

I have a scipy sparse matrix data and an integer n which corropsonds to a row in data which I want to delete. To delete this row I tried this:

data = sparse.csr_matrix(np.delete(np.array(data),n, axis=0))

However, this produced this error:

Traceback (most recent call last):
  File "...", line 260, in <module>
    X_labeled = sparse.csr_matrix(np.delete(np.array(X_labeled),n, axis=0))
  File "/anaconda3/lib/python3.6/site-packages/scipy/sparse/compressed.py", line 79, in __init__
    self._set_self(self.__class__(coo_matrix(arg1, dtype=dtype)))
  File "/anaconda3/lib/python3.6/site-packages/scipy/sparse/coo.py", line 177, in __init__
    self.row, self.col = M.nonzero()
SystemError: <built-in method nonzero of numpy.ndarray object at 0x113c883f0> returned a result with an error set

When I run:

data = np.delete(data.toarray(),n, axis=0)

I get this error:

Traceback (most recent call last):
  File "...", line 261, in <module>
    X_labeled = np.delete(X_labeled.toarray(),n, axis=0)
  File "/anaconda3/lib/python3.6/site-packages/numpy/lib/function_base.py", line 4839, in delete
    "size %i" % (obj, axis, N))
IndexError: index 86 is out of bounds for axis 0 with size 4

When I run this:

print(type(data))
print(data.shape)
print(data.toarray().shape)

I get this:

<class 'scipy.sparse.csr.csr_matrix'>
(4, 2740)
(4, 2740)

Upvotes: 3

Views: 2477

Answers (1)

hpaulj
hpaulj

Reputation: 231395

The correct way to turn a sparse matrix into a dense one is with toarray, not np.array(...):

In [408]: M = sparse.csr_matrix(np.eye(3))
In [409]: M
Out[409]: 
<3x3 sparse matrix of type '<class 'numpy.float64'>'
    with 3 stored elements in Compressed Sparse Row format>
In [410]: np.array(M)
Out[410]: 
array(<3x3 sparse matrix of type '<class 'numpy.float64'>'
    with 3 stored elements in Compressed Sparse Row format>, dtype=object)

This is a single element object dtype array that contains the sparse matrix - unchanged.

In [411]: M.toarray()
Out[411]: 
array([[ 1.,  0.,  0.],
       [ 0.,  1.,  0.],
       [ 0.,  0.,  1.]])

delete works with this correct array:

In [414]: data = sparse.csr_matrix(np.delete(M.toarray(),1, axis=0))
In [415]: data
Out[415]: 
<2x3 sparse matrix of type '<class 'numpy.float64'>'
    with 2 stored elements in Compressed Sparse Row format>
In [416]: data.A
Out[416]: 
array([[ 1.,  0.,  0.],
       [ 0.,  0.,  1.]])

Indexing will do the same thing:

In [417]: M[[0,2],:]
Out[417]: 
<2x3 sparse matrix of type '<class 'numpy.float64'>'
    with 2 stored elements in Compressed Sparse Row format>
In [418]: _.A
Out[418]: 
array([[ 1.,  0.,  0.],
       [ 0.,  0.,  1.]])
In [420]: M[np.array([True,False,True]),:].A
Out[420]: 
array([[ 1.,  0.,  0.],
       [ 0.,  0.,  1.]])

I would guess that the indexing route is faster, but we'd have to do time tests on realistic size arrays to be sure.

Internally delete is rather complex, but for some inputs it does something like this - constructing a boolean array with False for the rows you want to delete.


making the boolean mask:

In [421]: mask=np.ones((3,),bool)
In [422]: mask[1]=False
In [423]: M[mask,:].A

Upvotes: 4

Related Questions