Reputation: 403
I built some sparse matrix M
in Python using the coo_matrix
format. I would like to find an efficient way to compute:
A = M + M.T - D
where D
is the restriction of M
to its diagonal (M
is potentially very large). I can't find a way to efficiently build D
while keeping a coo_matrix
format. Any ideas?
Could D = scipy.sparse.spdiags(coo_matrix.diagonal(M),0,M.shape[0],M.shape[0])
be a solution?
Upvotes: 1
Views: 340
Reputation: 231385
I have come up with a faster coo
diagonal:
msk = M.row==M.col
D1 = sparse.coo_matrix((M.data[msk],(M.row[msk],M.col[msk])),shape=M.shape)
sparse.tril
uses this method with mask = A.row + k >= A.col
(sparse/extract.py
)
Some times for a (100,100) M
(and M1 = M.tocsr()
)
In [303]: timeit msk=M.row==M.col; D1=sparse.coo_matrix((M.data[msk],(M.row[msk],M.col[msk])),shape=M.shape)
10000 loops, best of 3: 115 µs per loop
In [305]: timeit D=sparse.diags(M.diagonal(),0)
1000 loops, best of 3: 358 µs per loop
So the coo
way of getting the diagional is fast, at least for this small, and very sparse matrix (only 1 time in the diagonal)
If I start with the csr
form, the diags
is faster. That's because .diagonal
works in the csr
format:
In [306]: timeit D=sparse.diags(M1.diagonal(),0)
10000 loops, best of 3: 176 µs per loop
But creating D
is a small part of the overall calculation. Again, working with M1
is faster. The sum is done in csr
format.
In [307]: timeit M+M.T-D
1000 loops, best of 3: 1.35 ms per loop
In [308]: timeit M1+M1.T-D
1000 loops, best of 3: 1.11 ms per loop
Another way to do the whole thing is to take advantage of that fact that coo
allows duplicate i,j
values, which will be summed when converted to csr
format. So you could stack the row, col, data
arrays for M
with those for M.T
(see M.transpose
for how those are constructed), along with masked values for D
. (or the masked diagonals could be removed from M
or M.T
)
For example:
def MplusMT(M):
msk=M.row!=M.col;
data=np.concatenate([M.data, M.data[msk]])
rows=np.concatenate([M.row, M.col[msk]])
cols=np.concatenate([M.col, M.row[msk]])
MM=sparse.coo_matrix((data, (rows, cols)), shape=M.shape)
return MM
# alt version with a more explicit D
# msk=M.row==M.col;
# data=np.concatenate([M.data, M.data,-M.data[msk]])
MplusMT
as written is very fast because it is just doing array concatenation, not summation. To do that we have to convert it to a csr
matrix.
MplusMT(M).tocsr()
which takes considerably longer. Still this approach is, in my limited testing, more than 2x faster than M+M.T-D
. So it's a potential tool for constructing complex sparse matrices.
Upvotes: 1
Reputation: 35125
You probably want
from scipy.sparse import diags
D = diags(M.diagonal(), 0, format='coo')
This will still build an M-size 1d array as an intermediate step, but that will probably not be so bad.
Upvotes: 0