Francisco Roque
Francisco Roque

Reputation: 441

Partitioning a Digraph into n chunks

I have a DiGraph representing a simple network, an example shown below:

DiGraph

I would like to partition this network into n chunks of at least 5 nodes. Each chunk, or partition needs to be densely connected, and no nodes can be left out. The network may assume different configurations, so a generic algorithm is required.

I have implemented a simple chunker algorithm that works when the network has no nodes with degree higher than 2, on the node index list:

import itertools

def chunks(items, cutpoints):
    return [items[i:j] for i,j in zip([0] + cutpoints, cutpoints + [len(items)])]

# this function will partition the line into segments, based on the number of breakpoints n-1
def generate_chunks(items, n, min_size):
    indices = range(1,len(items))
    for cutpoints in itertools.combinations(indices,n-1):
        chunk = chunks(items,list(cutpoints))
        if len(min(chunk, key=len)) < min_size:
            continue
        yield chunk

Due to the way the chunker function works on a list, when applied on the full graph this generates no chunks:

candidates = []
for candidate in generate_chunks(list(Graph.nodes()), 3, 5):
    suitable_candidate = True
    for segment in candidate:
        subgraph = Graph.subgraph(segment)
        if not nx.is_strongly_connected(subgraph):
            suitable_candidate = False
            break
        # we want subgraphs where no nodes are connecting to more than 2 nodes
        if max(subgraph.degree(), key = lambda x: x[1])[1] > 4:
            suitable_candidate = False
            break
    if suitable_candidate:
        candidates.append(candidate)

I have not been able to modify it so that it takes into account the more complex topology of a graph. I couldn't find a function in networkx that partitions a graph into n random connected pieces. I also considered randomly picking n-1 nodes and calculate shortest paths between them, but seems like a wasteful operation.

The represented graph is a simplified version of a more complex network.

Appreciate if someone could point me in the right direction.

Upvotes: 2

Views: 1533

Answers (1)

Paul Brodersen
Paul Brodersen

Reputation: 13041

AFAIK, there is nothing off the shelf that fits the bill perfectly. There is the Kernighan-Lin algorithm that partitions a graph into two, so if the desired number of partitions is a power of two, this is a reasonable choice. However, there are no guarantees w.r.t. the minimum chunk size (other than > 1).

partition_1, partition_2 = nx.algorithms.community.kernighan_lin_bisection(g)
subgraph_1, subgraph_2 = g.subgraph(partition_1), g.subgraph(partition_2)
# and recurse on each subgraph until the desired number of partitions is achieved

Since your example looks pretty close to a tree, most of its structure is probably preserved when converting it to a tree. Then you can use Lukes' algorithm, which, however, is parameterized by the maximum chunk size, not the total number of chunks.

max_size = g.size() / n # will probably result in more than n chunks 
mst = nx.minimum_spanning_tree(g)
partitions = nx.algorithms.community.lukes_partitioning(mst, max_size)

When converting to a tree, using the minimum spanning tree is probably not ideal in this case. What you actually want is an optimally balanced tree such that the maximum branching is minimal. I don't think networkx has an implementation of any of the available heuristics but I am not sure. That being said, the MST often works well enough.

Finally, there is networkx-metis. However, since I have never used it, I can't comment on its usefulness in your case.

Edit

Here is a rough implementation of a chunking algorithm that is loosely inspired by Lukes' algorithm but parameterized by a minimum size as this seems to be the more pertinent parameter in your case.

It works well for graphs that are trees or tree-like but yields pretty bad results for densely connected graphs. The problem with dense graphs is that minimum spanning tree algorithms often result in high-degree nodes with many leaves. Presumably, using a balanced spanning tree instead of the minimum spanning tree would improve the results.

enter image description here

#!/usr/bin/env python
"""
Partition a graph into connected subgraphs of minimum size.
"""
import numpy as np
import matplotlib.pyplot as plt
import networkx as nx


def nx_chunk(graph, chunk_size):
    """
    Chunk a graph into subgraphs with the specified minimum chunk size.

    Inspired by Lukes algorithm.
    """

    # convert to a tree;
    # a balanced spanning tree would be preferable to a minimum spanning tree,
    # but networkx does not seem to have an implementation of that
    tree = nx.minimum_spanning_tree(graph)

    # select a root that is maximally far away from all leaves
    leaves = [node for node, degree in tree.degree() if degree == 1]
    minimum_distance_to_leaf = {node : tree.size() for node in tree.nodes()}
    for leaf in leaves:
        distances = nx.single_source_shortest_path_length(tree, leaf)
        for node, distance in distances.items():
            if distance < minimum_distance_to_leaf[node]:
                minimum_distance_to_leaf[node] = distance
    root = max(minimum_distance_to_leaf, key=minimum_distance_to_leaf.get)

    # make the tree directed and compute the total descendants of each node
    tree = nx.dfs_tree(tree, root)
    total_descendants = get_total_descendants(tree)

    # prune chunks, starting from the leaves
    chunks = []
    max_descendants = np.max(list(total_descendants.values()))
    while (max_descendants + 1 > chunk_size) & (tree.size() >= 2 * chunk_size):
        for node in list(nx.topological_sort(tree))[::-1]: # i.e. from leaf to root
            if (total_descendants[node] + 1) >= chunk_size:
                chunk = list(nx.descendants(tree, node))
                chunk.append(node)
                chunks.append(chunk)

                # update relevant data structures
                tree.remove_nodes_from(chunk)
                total_descendants = get_total_descendants(tree)
                max_descendants = np.max(list(total_descendants.values()))

                break

    # handle remainder
    chunks.append(list(tree.nodes()))

    return chunks


def get_total_descendants(dag):
    return {node : len(nx.descendants(dag, node)) for node in dag.nodes()}


if __name__ == '__main__':

    fig, axes = plt.subplots(1, 3, figsize=(12, 4))

    examples = nx.balanced_tree(2, 6), nx.random_powerlaw_tree(64), nx.karate_club_graph()

    for g, ax in zip(examples, axes):
        chunks = nx_chunk(g, 5)
        node_to_color = dict()
        for ii, chunk in enumerate(chunks):
            for node in chunk:
                node_to_color[node] = ii
        nx.draw(g, node_color=[node_to_color[node] for node in g.nodes()], cmap='tab20', ax=ax)

    plt.show()

    for ii, chunk in enumerate(chunks):
        for node in chunk:
            node_to_color[node] = ii

    nx.draw(g, node_color=[node_to_color[node] for node in g.nodes()], cmap='tab20')
    plt.show()

Upvotes: 2

Related Questions