Reputation: 2382
I've been recently trying to do the following (efficiently)
Read a sparse (csr) matrix
Select a subset of rows
Construct another matrix (all zeros)
Fill 3. with the subset obtained in 2.
I can almost achieve this as follows:
input_matrix = scipy.io.loadmat(some_matrix)
random_indices = np.random.choice(input_matrix.shape[1], num_samples, replace=False)
second_matrix = sp.dok_matrix(input_matrix.shape)
## this takes up too much memory!
second_matrix[random_indices] = input_matrix[random_indices]
How does one do this more efficiently? I would not like to call .todense() at any point, as this would also explode in memory. Intuitively, one should be able to maybe mask a part of the matrix? In numpy (dense), I would simply fill the remainder with zeros, but for csr matrices I am not sure whether this is the way.
Thanks!
Upvotes: 0
Views: 35
Reputation: 2382
I testted .dok and .csr formats, yet the only format which does not result in space explosion is:
second_matrix[random_indices] = input_matrix.tolil()[random_indices]
Hence, the .lil matrix.
The suggestion by hpaulj also makes sense, yet I have yet to test it (lil works just fine for my task).
Upvotes: 0