Reputation: 33
def simrank_sparse(A,c,maxiter,eps=1e-4):
if not sp.issparse(A):
raise Exception("Input is not a sparse matrix ")
n=sp.csr_matrix.get_shape(A)[0]
Q=misc.get_column_normalized_matrix(A)
sim=sp.eye(n)
I=sp.eye(n)
sim_prev=sp.csr_matrix(sim)
for t in range(maxiter):
if sc.allclose(sim,sim_prev,atol=eps):
break
sim_prev=sc.copy(sim)
sim=c*(Q.T*sim_prev*Q)+(1-c)*I
print("Converge after %d iterations (eps=%f)." % (t, eps))
return sim
I am using sparse matrices but the numpy.allclose()
function is giving errors as it only takes numpy arrays as input. I don't want to convert the sparse matrices to arrays and back to sparse matrices again, as it will be inefficient. Is there another way to check two sparse matrices for allclose()
?
Upvotes: 3
Views: 1988
Reputation: 231540
If you know that the matrices match in sparsity - number of nonzeros and matching nonzero indices, the you could just compare the data
.
np.allclose(sim.data,sim_prev.data,atol=eps)
The would be true if the matrices are constructed in the same way, or one is sparsity preserving derivation of the other.
Some time tests:
In [153]: M = sparse.random(1000,1000,.2, 'csr')
In [154]: M
Out[154]:
<1000x1000 sparse matrix of type '<class 'numpy.float64'>'
with 200000 stored elements in Compressed Sparse Row format>
Another matrix of same shape, but different sparsity
In [155]: M1 = sparse.random(1000,1000,.2, 'csr')
In [156]:
In [156]: timeit np.abs(M-M1).max()
12.3 ms ± 339 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
In [157]: timeit np.abs(M.A-M1.A).max()
24.4 ms ± 624 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)
If the matrices match in sparsity, we can compare the data
attributes for a considerable time savings:
In [158]: M2 = M.copy()
In [159]: M2.data += np.random.rand(M2.data.shape[0])*.001
In [160]: timeit np.abs(M.data-M2.data).max()
2.77 ms ± 13.7 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
A preliminary check on sparsity could save a lot of time (provided we aren't worried epsilon
size nonzero values):
In [170]: timeit np.allclose(M.indptr,M1.indptr)
97.8 µs ± 2.29 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each)
Remove the toarray()
step from the timings, in effect allclose
on equivalent size dense arrays:
In [164]: %%timeit Ma=M.A; M1a=M1.A
...: np.abs(Ma-M1a).max()
...:
14.8 ms ± 31 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
Subtraction of the sparse matrices is doing better than I expected.
Actually I don't need to use np.abs
; the Python abs
delegates to M.__abs__
method, which is:
def __abs__(self):
return self._with_data(abs(self._deduped_data()))
Upvotes: 0
Reputation: 191
You can set up the architecture for the comparison, and then use numpy for the evaluation:
def csr_allclose(a, b, rtol=1e-5, atol = 1e-8):
c = np.abs(np.abs(a - b) - rtol * np.abs(b))
return c.max() <= atol
The csr_matrix
c
will contain the difference of the two matrices being compared, and if any of the absolute differences are greater than your threshold levels, csr_allclose
will return False
. This implementation doesn't include the NaN
features offered by the numpy.allclose
method, however.
Upvotes: 2