Atsushi Sakai
Atsushi Sakai

Reputation: 1014

Efficient connection grouping algorithm or implementation in python

I'm looking for an efficient connection grouping (I'm not sure this is proper name..) algorithm or implementation by python.

For example, I have this nested list:

connection_data = [
   ...:     ["A", "B", "C"],
   ...:     ["B", "D"],
   ...:     ["A", "C"],
   ...:     ["E", "F"],
   ...:     ["C", "D"],
   ...:     ]

This data means each list in the nested list shows connections. For example, the first connection ["A", "B", "C"] means A and B and C are having connection each other. The nested list has multiple connections information.

I would like to calculate connection groupings from a nested list. For example, when I have the upper connection_data, I would like to get

grouped_connection = [
   ...:     ["A", "B", "C", "D"],
   ...:     ["E", "F"],
   ...:     ]

Because, A, B, C, D have connections in these connection data in the connection_data: ["A", "B", "C"], ["B", "D"], ["A", "C"], ["C", "D"], E and F have connection by ["E", "F"].

To summarize my questions:

  1. What is this type of problem called in general?
  2. I think I can implement a many for-loop based solver. But is there any efficient algorithm or implementation in python for this type of problem?

Upvotes: 4

Views: 591

Answers (2)

BrokenBenchmark
BrokenBenchmark

Reputation: 19253

Note: This answer is actually slower than the union-find algorithm given by hilberts_drinking_problem. If all of the input connections are pairs (i.e. of size 2), the runtime of both algorithms is essentially the same. However, this is not the case for OP's question. See the comments for more details.


You can construct a graph where the nodes are lettered A, B, C ..., and place an undirected, unweighted edge between two nodes if the initial groupings dictate that they should be in the same group. Then, the final groups are the connected components of the constructed graph. (This is the closest thing to what your problem is generally called.)

This can be done using a graph traversal algorithm, such as BFS or DFS, but if you don't want to code this up by hand, networkx can take care of everything once you've made the graph. You need to do a little bit of preprocessing on the groupings since some of them are of size greater than two, but otherwise networkx is well-suited for this problem:

from itertools import combinations
import networkx as nx

groups = [
    ["A", "B", "C"],
    ["B", "D"],
    ["A", "C"],
    ["E", "F"],
    ["C", "D"],
]

G = nx.Graph()

# Handle initial groupings of size greater than two by iterating over
# each possible pair of letters in the group.
for group in groups:
    G.add_edges_from(combinations(sorted(group), 2))

# Result should look something like [['B', 'C', 'A', 'D'], ['F', 'E']],
# but the ordering may be nondeterministic.
print(list(list(group) for group in nx.connected_components(G)))

Upvotes: 2

hilberts_drinking_problem
hilberts_drinking_problem

Reputation: 11602

Networkx provides an implementation of a union find datastructure [1] [2] that solves this problem efficiently:

from networkx.utils.union_find import UnionFind

groups = [
    ["A", "B", "C"],
    ["B", "D"],
    ["A", "C"],
    ["E", "F"],
    ["C", "D"],
]

ds = UnionFind()
for gp in groups:
  ds.union(*gp)
for s in ds.to_sets():
  print(s)

# {'B', 'C', 'D', 'A'}
# {'E', 'F'}

Upvotes: 5

Related Questions