Reputation: 1779
I have a sparse.csr_matrix M and a vector V of following dimensions.
M: 0.15M x 1.3M
V: 0.15M
I want to replace all 0 elements in rows of M with the corresponding entry in V.
M = [[0, 1, 2],
[3, 4, 0],
[6, 0, 8]]
V = [[11],
[22],
[33]]
I want to modify M to M' such that
M = [[11, 1, 2],
[ 3, 4, 22],
[ 6, 33, 8]]
This can easily be done in loops, but I am wondering if there are more elegant Pythonian ways. As my data are huge, I am looking for a very fast way to accomplish this task.
The loop version will look like:
for i in range(0,3):
for j in range(0,3):
if M[i,j] == 0 and V[i] !=0:
M[i,j] = V[i]
Upvotes: 0
Views: 690
Reputation: 231738
Here's something that should be fast, and work without making M
dense. The result though will be dense. There's no way around that
Expand V
into a matrix of the same size as M
In [711]: Z = np.repeat(V,M.shape[1],axis=1)
In [712]: idx=M.nonzero()
In [713]: Z[idx]=M.data
In [714]: Z
Out[714]:
array([[11, 1, 2],
[ 3, 4, 22],
[ 6, 33, 8]])
It finds where all the nonzero values of M
are located (basically the row
and col
attributes of M.tocoo()
. And then just replaces the fill
values in Z
with the corresponding data
values from M
.
This would fail in the case where M
hasn't been pruned; where some elements have been set to zero. That's because the full code for M.nonzero
is:
A = self.tocoo()
nz_mask = A.data != 0
return (A.row[nz_mask],A.col[nz_mask])
It might be safer to use
In [717]: Mc=M.tocoo()
In [718]: Z[Mc.row, Mc.col] = Mc.data
In [719]: Z
Out[719]:
array([[11, 1, 2],
[ 3, 4, 22],
[ 6, 33, 8]])
That would protect against possible reordering of data
when converting from csr
to coo
.
Upvotes: 1