Mithril
Mithril

Reputation: 13778

get all unique value from a sparse matrix[python/scipy]

I am trying to make a machine learning lib work together with scipy sparse matrix.

Below code is to detect if there are more than 1 class in y or not.Because it doesn't make sense if there is only 1 class when doing classification.

import numpy as np
y = np.array([0,1,0,1,0,1])
uniques = set(y)  # get {0, 1}

if len(uniques) == 1:
    raise RuntimeError("Only one class detected, aborting...")

But set(y) not work if y is scipy sparse matrix.

How to efficiently get all unique value if y is scipy sparse matrix?

PS: I know set(y.todense()) may work, but is cost too much memory

UPDATE:

>>> y = sp.csr_matrix(np.array([0,1,0,1,0,1]))
>>> set(y.data)
{1}
>>> y.data
array([1, 1, 1])

Upvotes: 2

Views: 2933

Answers (1)

hpaulj
hpaulj

Reputation: 231550

Sparse matrices store their values in different ways, but usually there is a .data attribute that contains the nonzero values.

set(y.data)

might be all that you need. This should work for coo, csr, csc. For others you many need to convert the matrix format (e.g. y.tocoo).

If that does not work, give us more details on the matrix format and problems.

Upvotes: 5

Related Questions