Normalize scipy sparse matrix with number of nonzero elements

Question

I want to divide each row of the csr_matrix by the number of non zero entries in that row.

For example : Consider a csr_matrix A:

A = [[6, 0, 0, 4, 0], [3, 18, 0, 9, 0]]
Result = [[3, 0, 0, 2, 0], [1, 6, 0, 3, 0]]

What's the shortest and efficient way to do it ?

Divakar · Accepted Answer

Get the counts with getnnz method and then replicate and divide in-place into its flattened view obtained with data method -

s = A.getnnz(axis=1)
A.data /= np.repeat(s, s)

Inspired by Row Division in Scipy Sparse Matrix 's solution post : Approach #2.

Sample run -

In [15]: from scipy.sparse import csr_matrix

In [16]: A = csr_matrix([[6, 0, 0, 4, 0], [3, 18, 0, 9, 0]])

In [18]: s = A.getnnz(axis=1)
    ...: A.data /= np.repeat(s, s)

In [19]: A.toarray()
Out[19]: 
array([[3, 0, 0, 2, 0],
       [1, 6, 0, 3, 0]])

Note: To be compatible between Python2 and 3, we might want to use // -

A.data //=  ...

Normalize scipy sparse matrix with number of nonzero elements

Answers (2)

Related Questions