user3787291
user3787291

Reputation: 227

Normalize scipy sparse matrix with number of nonzero elements

I want to divide each row of the csr_matrix by the number of non zero entries in that row.

For example : Consider a csr_matrix A:

A = [[6, 0, 0, 4, 0], [3, 18, 0, 9, 0]]
Result = [[3, 0, 0, 2, 0], [1, 6, 0, 3, 0]]

What's the shortest and efficient way to do it ?

Upvotes: 1

Views: 990

Answers (2)

Tai
Tai

Reputation: 7994

Divakar gives an in-place method. My trial creates a new array.

from scipy import sparse
A = sparse.csr_matrix([[6, 0, 0, 4, 0], [3, 18, 0, 9, 0]])
A.multiply(1.0/(A != 0).sum(axis=1)) 

We multiply the inverse values of the sum of non-zero parts in each row. Note that one may want to make sure there is no dividing-by-zero errors.

As Divakar pointed out: 1.0, instead of 1, is needed at A.multiply(1.0/...) to be compatible with Python 2.

Upvotes: 2

Divakar
Divakar

Reputation: 221564

Get the counts with getnnz method and then replicate and divide in-place into its flattened view obtained with data method -

s = A.getnnz(axis=1)
A.data /= np.repeat(s, s)

Inspired by Row Division in Scipy Sparse Matrix 's solution post : Approach #2.

Sample run -

In [15]: from scipy.sparse import csr_matrix

In [16]: A = csr_matrix([[6, 0, 0, 4, 0], [3, 18, 0, 9, 0]])

In [18]: s = A.getnnz(axis=1)
    ...: A.data /= np.repeat(s, s)

In [19]: A.toarray()
Out[19]: 
array([[3, 0, 0, 2, 0],
       [1, 6, 0, 3, 0]])

Note: To be compatible between Python2 and 3, we might want to use // -

A.data //=  ...

Upvotes: 6

Related Questions