Python matrix sums of arbitrary columns

Question

I'm writing an algorithm where I need to 'collapse' or 'reduce' a matrix based on cluster assignments for different nodes. However, the current implementation is by far the bottleneck of my complete algorithm (tested in Visual Studio Python profiler).

def reduce_matrix(mat: np.matrix, cluster_ids: np.array) -> np.matrix:
    """Reduce node adjacency matrix.

    Arguments:
        mat: Adjacency matrix
        cluster_ids: Cluster membership assignment per current node (integers)

    Returns:
        Reduced adjacency matrix
    """

    ordered_nodes = np.argsort(cluster_ids)
    counts = np.unique(cluster_ids, return_counts=True)[1]

    ends = np.cumsum(counts)
    starts = np.concatenate([[0], ends[:-1]])

    clusters = [ordered_nodes[start:end] for start, end in zip(starts, ends)]

    n_c = len(counts)

    reduced = np.mat(np.zeros((n_c, n_c), dtype=int))
    for a in range(n_c):
        a_nodes = clusters[a]
        for b in range(a + 1, n_c):
            b_nodes = clusters[b]
            reduced[a, b] = np.sum(mat[a_nodes, :][:, b_nodes])
            reduced[b, a] = np.sum(mat[b_nodes, :][:, a_nodes])

    return reduced

What would be the fastest way to sum arbitrary rows and columns in a matrix?

I believe the double indexing [a_nodes, :][:, b_nodes] creates a copy of the matrix instead of a view, but I'm not really sure if there is a quicker workaround...

B. M. · Accepted Answer

Numba can speed up such task in a very natural way, with no sorting issues. Here, a lot of irregular chunks must be managed so Numpy is not very efficient:

@numba.jit  
def reduce_matrix2(mat, cluster_ids):
    n_c=len(set(cluster_ids))
    out = np.zeros((n_c, n_c), dtype=int)
    for i,i_c in enumerate(cluster_ids):
        for j,j_c in enumerate(cluster_ids):
            out[i_c,j_c] += mat[i,j]
    np.fill_diagonal(out,0)            
    return out

On a 5000x5000 mat :

In [40]: %timeit r=reduce_matrix2(mat,cluster_ids)
30.3 ms ± 5.34 ms per loop (mean ± std. dev. of 7 runs, 10 loop each)

Python matrix sums of arbitrary columns

Answers (2)

Benchmarking

Related Questions