Reputation: 79
I'm writing an algorithm where I need to 'collapse' or 'reduce' a matrix based on cluster assignments for different nodes. However, the current implementation is by far the bottleneck of my complete algorithm (tested in Visual Studio Python profiler).
def reduce_matrix(mat: np.matrix, cluster_ids: np.array) -> np.matrix:
"""Reduce node adjacency matrix.
Arguments:
mat: Adjacency matrix
cluster_ids: Cluster membership assignment per current node (integers)
Returns:
Reduced adjacency matrix
"""
ordered_nodes = np.argsort(cluster_ids)
counts = np.unique(cluster_ids, return_counts=True)[1]
ends = np.cumsum(counts)
starts = np.concatenate([[0], ends[:-1]])
clusters = [ordered_nodes[start:end] for start, end in zip(starts, ends)]
n_c = len(counts)
reduced = np.mat(np.zeros((n_c, n_c), dtype=int))
for a in range(n_c):
a_nodes = clusters[a]
for b in range(a + 1, n_c):
b_nodes = clusters[b]
reduced[a, b] = np.sum(mat[a_nodes, :][:, b_nodes])
reduced[b, a] = np.sum(mat[b_nodes, :][:, a_nodes])
return reduced
What would be the fastest way to sum arbitrary rows and columns in a matrix?
I believe the double indexing [a_nodes, :][:, b_nodes]
creates a copy of the matrix instead of a view, but I'm not really sure if there is a quicker workaround...
Upvotes: 1
Views: 125
Reputation: 18668
Numba can speed up such task in a very natural way, with no sorting issues. Here, a lot of irregular chunks must be managed so Numpy is not very efficient:
@numba.jit
def reduce_matrix2(mat, cluster_ids):
n_c=len(set(cluster_ids))
out = np.zeros((n_c, n_c), dtype=int)
for i,i_c in enumerate(cluster_ids):
for j,j_c in enumerate(cluster_ids):
out[i_c,j_c] += mat[i,j]
np.fill_diagonal(out,0)
return out
On a 5000x5000
mat :
In [40]: %timeit r=reduce_matrix2(mat,cluster_ids)
30.3 ms ± 5.34 ms per loop (mean ± std. dev. of 7 runs, 10 loop each)
Upvotes: 2
Reputation: 221744
We can reduce it to one loop, by summing more number of blocks but in intervals with np.add.reduceat
and this should be more efficient.
The implementation would look something like this -
# Get ordered nodes
ordered_nodes = np.argsort(cluster_ids)
# Get indexed array
M = mat[np.ix_(ordered_nodes, ordered_nodes)]
# Get group boundaries on sorted cluster ids
sc = cluster_ids[ordered_nodes]
cut_idx = np.flatnonzero(np.r_[True, sc[1:] != sc[:-1], True])
# Setup output array
n_c = len(cut_idx)-1
out = np.zeros((n_c, n_c), dtype=mat.dtype)
# Per iteration perform reduction on chunks off indexed array M and
# defined by cut_idx as boundaries
for i, (s0, s1) in enumerate(zip(cut_idx[:-1], cut_idx[1:])):
out[i] = np.add.reduceat(M[s0:s1], cut_idx[:-1],axis=1).sum(0)
np.fill_diagonal(out,0)
Proposed approach as func -
def addreduceat_app(mat, cluster_ids):
ordered_nodes = np.argsort(cluster_ids)
M = mat[np.ix_(ordered_nodes, ordered_nodes)]
sc = cluster_ids[ordered_nodes]
cut_idx = np.flatnonzero(np.r_[True, sc[1:] != sc[:-1], True])
n_c = len(cut_idx)-1
out = np.zeros((n_c, n_c), dtype=mat.dtype)
for i, (s0, s1) in enumerate(zip(cut_idx[:-1], cut_idx[1:])):
out[i] = np.add.reduceat(M[s0:s1], cut_idx[:-1],axis=1).sum(0)
np.fill_diagonal(out,0)
return np.matrix(out)
Timings and verification on a dataset with 5000
clusters with 500
being unique ones -
In [518]: np.random.seed(0)
...: mat = np.random.randint(0,10,(5000,5000))
...: cluster_ids = np.random.randint(0,500,(5000))
In [519]: out1 = reduce_matrix(mat, cluster_ids)
...: out2 = addreduceat_app(mat, cluster_ids)
...: print np.allclose(out1, out2)
True
In [520]: %timeit reduce_matrix(mat, cluster_ids)
...: %timeit addreduceat_app(mat, cluster_ids)
1 loop, best of 3: 8.39 s per loop
10 loops, best of 3: 195 ms per loop
Upvotes: 1