mgalardini
mgalardini

Reputation: 1467

scipy sparse: obtain values present in both matrices

I have two sparse matrices that I want to compare element-wise:

from scipy import sparse as sp
t1 = sp.random(10, 10, 0.5)
t2 = sp.random(10, 10, 0.5)

In particular I would like to make a scatterplot for those elements present (i.e. non-zero) in both matrices, but so far the only way I could think of is to convert them to the dense format:

import matplotlib.pyplot as plt
plt.plot(t1.todense().flatten(),
         t2.todense().flatten(),
         'ko',
         alpha=0.1)

Which works terribly when the matrices are very large. Is there a more efficient way to do this?

enter image description here

Upvotes: 0

Views: 48

Answers (1)

hpaulj
hpaulj

Reputation: 231625

In [256]: t1
Out[256]: 
<10x10 sparse matrix of type '<class 'numpy.float64'>'
    with 50 stored elements in COOrdinate format>
In [257]: t2
Out[257]: 
<10x10 sparse matrix of type '<class 'numpy.float64'>'
    with 50 stored elements in COOrdinate format>

When plotting t1.todense().flatten() you plot data points for all elements of t1, whether zero or not. In this case 100 points.

One way to 'weed' out the zero elements is:

In [258]: t3 = t1.multiply(t2)
In [259]: t3
Out[259]: 
<10x10 sparse matrix of type '<class 'numpy.float64'>'
    with 28 stored elements in Compressed Sparse Row format>
In [260]: t11 = t3.astype(bool).multiply(t1)
In [261]: t21 = t3.astype(bool).multiply(t2)
In [262]: t11
Out[262]: 
<10x10 sparse matrix of type '<class 'numpy.float64'>'
    with 28 stored elements in Compressed Sparse Row format>

t3 has nonzero values where both t1 and t2 are nonzero. t11 has the corresponding elements of t1 (t3 floats become boolean True and implicitly 1 in the multiply.) Sparse multiply is relatively efficient (may be not as much as the corresponding dense multiply or even the sparse matrix multiply).

We could plot t11.todense.ravel() etc. That would be the same, except for a concentration of values as (0.0, 0.0). But the data attribute has the nonzero values, and the sparsity of t11 and t21 is the same, so we can just plot those - only 28 values in this case:

plt.plot(t11.data,
         t21.data,
         'ko',
         alpha=0.1);

There may be other ways of getting t11 and t21 matrices, but the basic idea still applies - get two matrices with the same sparsity, and plot just their data values.

<code>t1</code>vs<code>t2</code>

<code>t11.data</code> vs <code>t21.data</code>

Upvotes: 1

Related Questions