hanqiang
hanqiang

Reputation: 567

efficient way to get the max of each row for large sparse matrix

I have a large sparse matrix and I want to get the maximum value for each row. In numpy, I can call numpy.max(mat, axis=1), but I can not find similar function for scipy sparse matrix. Is there any efficient way to get the max of each row for a large sparse matrix?

Upvotes: 6

Views: 2342

Answers (2)

JakeM
JakeM

Reputation: 594

I just came across this same problem. Jaime's solution breaks if any of the rows in the matrix are completely empty. Here's a workaround:

def sparse_max_row(csr_mat):
    ret = np.zeros(csr_mat.shape[0])
    ret[np.diff(csr_mat.indptr) != 0] = np.maximum.reduceat(csr_mat.data,csr_mat.indptr[:-1][np.diff(csr_mat.indptr)>0])
    return ret

Upvotes: 2

Jaime
Jaime

Reputation: 67417

If your matrix, lets call it a, is stored in CSR format, then a.data has all the non-zero entries ordered by rows, and a.indptr has the index of the first element of every row. You can use this to calculate what you are after as follows:

def sparse_max_row(csr_mat):
    ret = np.maximum.reduceat(csr_mat.data, csr_mat.indptr[:-1])
    ret[np.diff(csr_mat.indptr) == 0] = 0
    return ret

Upvotes: 4

Related Questions