Nico Schlömer
Nico Schlömer

Reputation: 58791

SciPy sparse matrix (COO,CSR): Clear row

For creating a scipy sparse matrix, I have an array or row and column indices I and J along with a data array V. I use those to construct a matrix in COO format and then convert it to CSR,

matrix = sparse.coo_matrix((V, (I, J)), shape=(n, n))
matrix = matrix.tocsr()

I have a set of row indices for which the only entry should be a 1.0 on the diagonal. So far, I go through I, find all indices that need wiping, and do just that:

def find(lst, a):
    # From <http://stackoverflow.com/a/16685428/353337>
    return [i for i, x in enumerate(lst) if x in a]

# wipe_rows = [1, 55, 32, ...]  # something something

indices = find(I, wipe_rows)  # takes too long
I = numpy.delete(I, indices).tolist()
J = numpy.delete(J, indices).tolist()
V = numpy.delete(V, indices).tolist()

# Add entry 1.0 to the diagonal for each wipe row
I.extend(wipe_rows)
J.extend(wipe_rows)
V.extend(numpy.ones(len(wipe_rows)))

# construct matrix via coo

That works alright, but find tends to take a while.

Any hints on how to speed this up? (Perhaps wiping the rows in COO or CSR format is a better idea.)

Upvotes: 1

Views: 2579

Answers (2)

Nico Schl&#246;mer
Nico Schl&#246;mer

Reputation: 58791

If you intend to clear multiple rows at once, this

def _wipe_rows_csr(matrix, rows):
    assert isinstance(matrix, sparse.csr_matrix)

    # delete rows
    for i in rows:
        matrix.data[matrix.indptr[i]:matrix.indptr[i+1]] = 0.0

    # Set the diagonal
    d = matrix.diagonal()
    d[rows] = 1.0
    matrix.setdiag(d)

    return

is by far the fastest method. It doesn't really remove the lines, but sets all entries to zeros, then fiddles with the diagonal.

If the entries are actually to be removed, one has to do some array manipulation. This can be quite costly, but if speed is no issue: This

def _wipe_row_csr(A, i):
    '''Wipes a row of a matrix in CSR format and puts 1.0 on the diagonal.
    '''
    assert isinstance(A, sparse.csr_matrix)

    n = A.indptr[i+1] - A.indptr[i]

    assert n > 0

    A.data[A.indptr[i]+1:-n+1] = A.data[A.indptr[i+1]:]
    A.data[A.indptr[i]] = 1.0
    A.data = A.data[:-n+1]

    A.indices[A.indptr[i]+1:-n+1] = A.indices[A.indptr[i+1]:]
    A.indices[A.indptr[i]] = i
    A.indices = A.indices[:-n+1]

    A.indptr[i+1:] -= n-1

    return

replaces a given row i of the matrix matrix by the entry 1.0 on the diagonal.

Upvotes: 2

hpaulj
hpaulj

Reputation: 231385

np.in1d should be a faster way of finding the indices:

In [322]: I   # from a np.arange(12).reshape(4,3) matrix
Out[322]: array([0, 0, 1, 1, 1, 2, 2, 2, 3, 3, 3], dtype=int32)

In [323]: indices=[i for i, x in enumerate(I) if x in [1,2]]

In [324]: indices
Out[324]: [2, 3, 4, 5, 6, 7]

In [325]: ind1=np.in1d(I,[1,2])

In [326]: ind1
Out[326]: 
array([False, False,  True,  True,  True,  True,  True,  True, False,
       False, False], dtype=bool)

In [327]: np.where(ind1)   # same as indices
Out[327]: (array([2, 3, 4, 5, 6, 7], dtype=int32),)

In [328]: I[~ind1]  # same as the delete
Out[328]: array([0, 0, 3, 3, 3], dtype=int32)

Direct manipulation of the coo inputs like this often a good way. But another is to take advantage of the csr math abilities. You should be able to construct a diagonal matrix that zeros out the correct rows, and then adds the ones back in.

Here's what I have in mind:

In [357]: A=np.arange(16).reshape(4,4)
In [358]: M=sparse.coo_matrix(A)
In [359]: M.A
Out[359]: 
array([[ 0,  1,  2,  3],
       [ 4,  5,  6,  7],
       [ 8,  9, 10, 11],
       [12, 13, 14, 15]])

In [360]: d1=sparse.diags([(1,0,0,1)],[0],(4,4))
In [361]: d2=sparse.diags([(0,1,1,0)],[0],(4,4))

In [362]: (d1*M+d2).A
Out[362]: 
array([[  0.,   1.,   2.,   3.],
       [  0.,   1.,   0.,   0.],
       [  0.,   0.,   1.,   0.],
       [ 12.,  13.,  14.,  15.]])

In [376]: x=np.ones((4,),bool);x[[1,2]]=False
In [378]: d1=sparse.diags([x],[0],(4,4),dtype=int)
In [379]: d2=sparse.diags([~x],[0],(4,4),dtype=int)

Doing this with lil format looks easy:

In [593]: Ml=M.tolil()
In [594]: Ml.data[wipe]=[[1]]*len(wipe)
In [595]: Ml.rows[wipe]=[[i] for i in wipe]

In [596]: Ml.A
Out[596]: 
array([[ 0,  1,  2,  3],
       [ 0,  1,  0,  0],
       [ 0,  0,  1,  0],
       [12, 13, 14, 15]], dtype=int32)

It's sort of what you are doing with csr format, but it's easy to replace each row list with the appropriate [1] and [i] list. But conversion times (tolil etc) can hurt run times.

Upvotes: 0

Related Questions