Reputation: 441
I have a DiGraph representing a simple network, an example shown below:
I would like to partition this network into n chunks of at least 5 nodes. Each chunk, or partition needs to be densely connected, and no nodes can be left out. The network may assume different configurations, so a generic algorithm is required.
I have implemented a simple chunker algorithm that works when the network has no nodes with degree higher than 2, on the node index list:
import itertools
def chunks(items, cutpoints):
return [items[i:j] for i,j in zip([0] + cutpoints, cutpoints + [len(items)])]
# this function will partition the line into segments, based on the number of breakpoints n-1
def generate_chunks(items, n, min_size):
indices = range(1,len(items))
for cutpoints in itertools.combinations(indices,n-1):
chunk = chunks(items,list(cutpoints))
if len(min(chunk, key=len)) < min_size:
continue
yield chunk
Due to the way the chunker function works on a list, when applied on the full graph this generates no chunks:
candidates = []
for candidate in generate_chunks(list(Graph.nodes()), 3, 5):
suitable_candidate = True
for segment in candidate:
subgraph = Graph.subgraph(segment)
if not nx.is_strongly_connected(subgraph):
suitable_candidate = False
break
# we want subgraphs where no nodes are connecting to more than 2 nodes
if max(subgraph.degree(), key = lambda x: x[1])[1] > 4:
suitable_candidate = False
break
if suitable_candidate:
candidates.append(candidate)
I have not been able to modify it so that it takes into account the more complex topology of a graph. I couldn't find a function in networkx that partitions a graph into n random connected pieces. I also considered randomly picking n-1 nodes and calculate shortest paths between them, but seems like a wasteful operation.
The represented graph is a simplified version of a more complex network.
Appreciate if someone could point me in the right direction.
Upvotes: 2
Views: 1533
Reputation: 13041
AFAIK, there is nothing off the shelf that fits the bill perfectly. There is the Kernighan-Lin algorithm that partitions a graph into two, so if the desired number of partitions is a power of two, this is a reasonable choice. However, there are no guarantees w.r.t. the minimum chunk size (other than > 1).
partition_1, partition_2 = nx.algorithms.community.kernighan_lin_bisection(g)
subgraph_1, subgraph_2 = g.subgraph(partition_1), g.subgraph(partition_2)
# and recurse on each subgraph until the desired number of partitions is achieved
Since your example looks pretty close to a tree, most of its structure is probably preserved when converting it to a tree. Then you can use Lukes' algorithm, which, however, is parameterized by the maximum chunk size, not the total number of chunks.
max_size = g.size() / n # will probably result in more than n chunks
mst = nx.minimum_spanning_tree(g)
partitions = nx.algorithms.community.lukes_partitioning(mst, max_size)
When converting to a tree, using the minimum spanning tree is probably not ideal in this case. What you actually want is an optimally balanced tree such that the maximum branching is minimal. I don't think networkx has an implementation of any of the available heuristics but I am not sure. That being said, the MST often works well enough.
Finally, there is networkx-metis. However, since I have never used it, I can't comment on its usefulness in your case.
Here is a rough implementation of a chunking algorithm that is loosely inspired by Lukes' algorithm but parameterized by a minimum size as this seems to be the more pertinent parameter in your case.
It works well for graphs that are trees or tree-like but yields pretty bad results for densely connected graphs. The problem with dense graphs is that minimum spanning tree algorithms often result in high-degree nodes with many leaves. Presumably, using a balanced spanning tree instead of the minimum spanning tree would improve the results.
#!/usr/bin/env python
"""
Partition a graph into connected subgraphs of minimum size.
"""
import numpy as np
import matplotlib.pyplot as plt
import networkx as nx
def nx_chunk(graph, chunk_size):
"""
Chunk a graph into subgraphs with the specified minimum chunk size.
Inspired by Lukes algorithm.
"""
# convert to a tree;
# a balanced spanning tree would be preferable to a minimum spanning tree,
# but networkx does not seem to have an implementation of that
tree = nx.minimum_spanning_tree(graph)
# select a root that is maximally far away from all leaves
leaves = [node for node, degree in tree.degree() if degree == 1]
minimum_distance_to_leaf = {node : tree.size() for node in tree.nodes()}
for leaf in leaves:
distances = nx.single_source_shortest_path_length(tree, leaf)
for node, distance in distances.items():
if distance < minimum_distance_to_leaf[node]:
minimum_distance_to_leaf[node] = distance
root = max(minimum_distance_to_leaf, key=minimum_distance_to_leaf.get)
# make the tree directed and compute the total descendants of each node
tree = nx.dfs_tree(tree, root)
total_descendants = get_total_descendants(tree)
# prune chunks, starting from the leaves
chunks = []
max_descendants = np.max(list(total_descendants.values()))
while (max_descendants + 1 > chunk_size) & (tree.size() >= 2 * chunk_size):
for node in list(nx.topological_sort(tree))[::-1]: # i.e. from leaf to root
if (total_descendants[node] + 1) >= chunk_size:
chunk = list(nx.descendants(tree, node))
chunk.append(node)
chunks.append(chunk)
# update relevant data structures
tree.remove_nodes_from(chunk)
total_descendants = get_total_descendants(tree)
max_descendants = np.max(list(total_descendants.values()))
break
# handle remainder
chunks.append(list(tree.nodes()))
return chunks
def get_total_descendants(dag):
return {node : len(nx.descendants(dag, node)) for node in dag.nodes()}
if __name__ == '__main__':
fig, axes = plt.subplots(1, 3, figsize=(12, 4))
examples = nx.balanced_tree(2, 6), nx.random_powerlaw_tree(64), nx.karate_club_graph()
for g, ax in zip(examples, axes):
chunks = nx_chunk(g, 5)
node_to_color = dict()
for ii, chunk in enumerate(chunks):
for node in chunk:
node_to_color[node] = ii
nx.draw(g, node_color=[node_to_color[node] for node in g.nodes()], cmap='tab20', ax=ax)
plt.show()
for ii, chunk in enumerate(chunks):
for node in chunk:
node_to_color[node] = ii
nx.draw(g, node_color=[node_to_color[node] for node in g.nodes()], cmap='tab20')
plt.show()
Upvotes: 2