Sonu Mishra
Sonu Mishra

Reputation: 1779

Replace zero elements in rows of a csr_matrix with entries from a vector

I have a sparse.csr_matrix M and a vector V of following dimensions.

M: 0.15M x 1.3M
V: 0.15M

I want to replace all 0 elements in rows of M with the corresponding entry in V.

M = [[0, 1, 2],
     [3, 4, 0],
     [6, 0, 8]]
V = [[11],
     [22],
     [33]]

I want to modify M to M' such that

M = [[11,  1,  2],
     [ 3,  4, 22],
     [ 6, 33,  8]]

This can easily be done in loops, but I am wondering if there are more elegant Pythonian ways. As my data are huge, I am looking for a very fast way to accomplish this task.

The loop version will look like:

for i in range(0,3):
    for j in range(0,3):
        if M[i,j] == 0 and V[i] !=0:
            M[i,j] = V[i]

Upvotes: 0

Views: 690

Answers (1)

hpaulj
hpaulj

Reputation: 231738

Here's something that should be fast, and work without making M dense. The result though will be dense. There's no way around that

Expand V into a matrix of the same size as M

In [711]: Z = np.repeat(V,M.shape[1],axis=1)

In [712]: idx=M.nonzero()   

In [713]: Z[idx]=M.data

In [714]: Z
Out[714]: 
array([[11,  1,  2],
       [ 3,  4, 22],
       [ 6, 33,  8]])

It finds where all the nonzero values of M are located (basically the row and col attributes of M.tocoo(). And then just replaces the fill values in Z with the corresponding data values from M.

This would fail in the case where M hasn't been pruned; where some elements have been set to zero. That's because the full code for M.nonzero is:

    A = self.tocoo()
    nz_mask = A.data != 0
    return (A.row[nz_mask],A.col[nz_mask])

It might be safer to use

In [717]: Mc=M.tocoo()

In [718]: Z[Mc.row, Mc.col] = Mc.data

In [719]: Z
Out[719]: 
array([[11,  1,  2],
       [ 3,  4, 22],
       [ 6, 33,  8]])

That would protect against possible reordering of data when converting from csr to coo.

Upvotes: 1

Related Questions