sdgaw erzswer
sdgaw erzswer

Reputation: 2382

Scipy sparse matrix row broadcasing

I've been recently trying to do the following (efficiently)

  1. Read a sparse (csr) matrix

  2. Select a subset of rows

  3. Construct another matrix (all zeros)

  4. Fill 3. with the subset obtained in 2.

I can almost achieve this as follows:

input_matrix = scipy.io.loadmat(some_matrix)

random_indices = np.random.choice(input_matrix.shape[1], num_samples, replace=False)

second_matrix = sp.dok_matrix(input_matrix.shape)

## this takes up too much memory!
second_matrix[random_indices] = input_matrix[random_indices]

How does one do this more efficiently? I would not like to call .todense() at any point, as this would also explode in memory. Intuitively, one should be able to maybe mask a part of the matrix? In numpy (dense), I would simply fill the remainder with zeros, but for csr matrices I am not sure whether this is the way.

Thanks!

Upvotes: 0

Views: 35

Answers (1)

sdgaw erzswer
sdgaw erzswer

Reputation: 2382

I testted .dok and .csr formats, yet the only format which does not result in space explosion is:

second_matrix[random_indices] = input_matrix.tolil()[random_indices]

Hence, the .lil matrix.

The suggestion by hpaulj also makes sense, yet I have yet to test it (lil works just fine for my task).

Upvotes: 0

Related Questions