MacGuffin
MacGuffin

Reputation: 33

numpy.allclose() on SciPy sparse matrix

def simrank_sparse(A,c,maxiter,eps=1e-4):

    if not sp.issparse(A):
        raise Exception("Input is not a sparse matrix ")

    n=sp.csr_matrix.get_shape(A)[0]
    Q=misc.get_column_normalized_matrix(A)
    sim=sp.eye(n)
    I=sp.eye(n)
    sim_prev=sp.csr_matrix(sim)

    for t in range(maxiter):

        if sc.allclose(sim,sim_prev,atol=eps):
            break
        sim_prev=sc.copy(sim)
        sim=c*(Q.T*sim_prev*Q)+(1-c)*I

    print("Converge after %d iterations (eps=%f)." % (t, eps))
    return sim

I am using sparse matrices but the numpy.allclose() function is giving errors as it only takes numpy arrays as input. I don't want to convert the sparse matrices to arrays and back to sparse matrices again, as it will be inefficient. Is there another way to check two sparse matrices for allclose()?

Upvotes: 3

Views: 1988

Answers (2)

hpaulj
hpaulj

Reputation: 231540

If you know that the matrices match in sparsity - number of nonzeros and matching nonzero indices, the you could just compare the data.

np.allclose(sim.data,sim_prev.data,atol=eps)

The would be true if the matrices are constructed in the same way, or one is sparsity preserving derivation of the other.

Some time tests:

In [153]: M = sparse.random(1000,1000,.2, 'csr')
In [154]: M
Out[154]: 
<1000x1000 sparse matrix of type '<class 'numpy.float64'>'
    with 200000 stored elements in Compressed Sparse Row format>

Another matrix of same shape, but different sparsity

In [155]: M1 = sparse.random(1000,1000,.2, 'csr')
In [156]: 
In [156]: timeit np.abs(M-M1).max()
12.3 ms ± 339 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

In [157]: timeit np.abs(M.A-M1.A).max()
24.4 ms ± 624 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)

If the matrices match in sparsity, we can compare the data attributes for a considerable time savings:

In [158]: M2 = M.copy()
In [159]: M2.data += np.random.rand(M2.data.shape[0])*.001
In [160]: timeit np.abs(M.data-M2.data).max()
2.77 ms ± 13.7 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

A preliminary check on sparsity could save a lot of time (provided we aren't worried epsilon size nonzero values):

In [170]: timeit np.allclose(M.indptr,M1.indptr)
97.8 µs ± 2.29 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each)

Remove the toarray() step from the timings, in effect allclose on equivalent size dense arrays:

In [164]: %%timeit Ma=M.A; M1a=M1.A
     ...: np.abs(Ma-M1a).max()
     ...: 
14.8 ms ± 31 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

Subtraction of the sparse matrices is doing better than I expected.


Actually I don't need to use np.abs; the Python abs delegates to M.__abs__ method, which is:

def __abs__(self):
    return self._with_data(abs(self._deduped_data()))

Upvotes: 0

K. Nielson
K. Nielson

Reputation: 191

You can set up the architecture for the comparison, and then use numpy for the evaluation:

def csr_allclose(a, b, rtol=1e-5, atol = 1e-8):
    c = np.abs(np.abs(a - b) - rtol * np.abs(b))
    return c.max() <= atol

The csr_matrix c will contain the difference of the two matrices being compared, and if any of the absolute differences are greater than your threshold levels, csr_allclose will return False. This implementation doesn't include the NaN features offered by the numpy.allclose method, however.

Upvotes: 2

Related Questions