Tiemen Schuijbroek
Tiemen Schuijbroek

Reputation: 79

Python matrix sums of arbitrary columns

I'm writing an algorithm where I need to 'collapse' or 'reduce' a matrix based on cluster assignments for different nodes. However, the current implementation is by far the bottleneck of my complete algorithm (tested in Visual Studio Python profiler).

def reduce_matrix(mat: np.matrix, cluster_ids: np.array) -> np.matrix:
    """Reduce node adjacency matrix.

    Arguments:
        mat: Adjacency matrix
        cluster_ids: Cluster membership assignment per current node (integers)

    Returns:
        Reduced adjacency matrix
    """

    ordered_nodes = np.argsort(cluster_ids)
    counts = np.unique(cluster_ids, return_counts=True)[1]

    ends = np.cumsum(counts)
    starts = np.concatenate([[0], ends[:-1]])

    clusters = [ordered_nodes[start:end] for start, end in zip(starts, ends)]

    n_c = len(counts)

    reduced = np.mat(np.zeros((n_c, n_c), dtype=int))
    for a in range(n_c):
        a_nodes = clusters[a]
        for b in range(a + 1, n_c):
            b_nodes = clusters[b]
            reduced[a, b] = np.sum(mat[a_nodes, :][:, b_nodes])
            reduced[b, a] = np.sum(mat[b_nodes, :][:, a_nodes])

    return reduced

What would be the fastest way to sum arbitrary rows and columns in a matrix?

I believe the double indexing [a_nodes, :][:, b_nodes] creates a copy of the matrix instead of a view, but I'm not really sure if there is a quicker workaround...

Upvotes: 1

Views: 125

Answers (2)

B. M.
B. M.

Reputation: 18668

Numba can speed up such task in a very natural way, with no sorting issues. Here, a lot of irregular chunks must be managed so Numpy is not very efficient:

@numba.jit  
def reduce_matrix2(mat, cluster_ids):
    n_c=len(set(cluster_ids))
    out = np.zeros((n_c, n_c), dtype=int)
    for i,i_c in enumerate(cluster_ids):
        for j,j_c in enumerate(cluster_ids):
            out[i_c,j_c] += mat[i,j]
    np.fill_diagonal(out,0)            
    return out   

On a 5000x5000 mat :

In [40]: %timeit r=reduce_matrix2(mat,cluster_ids)
30.3 ms ± 5.34 ms per loop (mean ± std. dev. of 7 runs, 10 loop each)

Upvotes: 2

Divakar
Divakar

Reputation: 221744

We can reduce it to one loop, by summing more number of blocks but in intervals with np.add.reduceat and this should be more efficient.

The implementation would look something like this -

# Get ordered nodes
ordered_nodes = np.argsort(cluster_ids)

# Get indexed array
M = mat[np.ix_(ordered_nodes, ordered_nodes)]

# Get group boundaries on sorted cluster ids
sc = cluster_ids[ordered_nodes]
cut_idx = np.flatnonzero(np.r_[True, sc[1:] != sc[:-1], True])

# Setup output array
n_c = len(cut_idx)-1
out = np.zeros((n_c, n_c), dtype=mat.dtype)

# Per iteration perform reduction on chunks off indexed array M and 
# defined by cut_idx as boundaries
for i, (s0, s1) in enumerate(zip(cut_idx[:-1], cut_idx[1:])):
    out[i] =  np.add.reduceat(M[s0:s1], cut_idx[:-1],axis=1).sum(0)

np.fill_diagonal(out,0)

Benchmarking

Proposed approach as func -

def addreduceat_app(mat, cluster_ids):
    ordered_nodes = np.argsort(cluster_ids)
    M = mat[np.ix_(ordered_nodes, ordered_nodes)]
    sc = cluster_ids[ordered_nodes]
    cut_idx = np.flatnonzero(np.r_[True, sc[1:] != sc[:-1], True])
    n_c = len(cut_idx)-1
    out = np.zeros((n_c, n_c), dtype=mat.dtype)
    for i, (s0, s1) in enumerate(zip(cut_idx[:-1], cut_idx[1:])):
        out[i] =  np.add.reduceat(M[s0:s1], cut_idx[:-1],axis=1).sum(0)

    np.fill_diagonal(out,0)
    return np.matrix(out)

Timings and verification on a dataset with 5000 clusters with 500 being unique ones -

In [518]: np.random.seed(0)
     ...: mat = np.random.randint(0,10,(5000,5000))
     ...: cluster_ids = np.random.randint(0,500,(5000))

In [519]: out1 = reduce_matrix(mat, cluster_ids)
     ...: out2 = addreduceat_app(mat, cluster_ids)
     ...: print np.allclose(out1, out2)
True

In [520]: %timeit reduce_matrix(mat, cluster_ids)
     ...: %timeit addreduceat_app(mat, cluster_ids)
1 loop, best of 3: 8.39 s per loop
10 loops, best of 3: 195 ms per loop

Upvotes: 1

Related Questions