user15101373
user15101373

Reputation:

How to find the largest linked grouping of sub-graphs?

I want to determine the largest contiguous (if that’s the right word) graph consisting of a bunch of sub-graphs. I define two sub-graphs as being contiguous if any of the nodes between the two sub-graphs are linked.

My initial solution to this is very slow and cumbersome and stupid – just to look at each sub-graph, see if it’s linked to any of the other sub-graphs, and do the analysis for all of the sub-graphs to find the largest number of linked sub-graphs. That’s just me coming from a Fortran background. Is there a better way to do it – a pythonic way, even a graph theory way? I imagine this is a standard question in network science.

Upvotes: 0

Views: 238

Answers (1)

constantstranger
constantstranger

Reputation: 9379

A good starting point to answer the kind of question you've asked is to look at a merge-find (or disjoint-set) approach (https://en.wikipedia.org/wiki/Disjoint-set_data_structure).

It's offers an efficient algorithm (at least on an amortized basis) to identify which members of a collection of graphs are disjoint and which aren't.

Here are a couple of related questions that have pointers to additional resources about this algorithm (also know as "union-find"):

Union find implementation using Python

A set union find algorithm

You can get quite respectable performance by merging two sets using "union by rank" as summarized in the Wikipedia page (and the pseudocode provided therein):

For union by rank, a node stores its rank, which is an upper bound for its height. When a node is initialized, its rank is set to zero. To merge trees with roots x and y, first compare their ranks. If the ranks are different, then the larger rank tree becomes the parent, and the ranks of x and y do not change. If the ranks are the same, then either one can become the parent, but the new parent's rank is incremented by one. While the rank of a node is clearly related to its height, storing ranks is more efficient than storing heights. The height of a node can change during a Find operation, so storing ranks avoids the extra effort of keeping the height correct.

I believe there may be even more sophisticated approaches, but the above union-by-rank implementation is what I have used in the past.

Upvotes: 0

Related Questions