Reputation: 1718
I have a (a very large) list of sets, containing pairs of values such as:
SetList = [{1,2},{2,3},{4,5},{5,6},{1,7}]
I'd like to efficiently determine the sets of values that are disjoint as implied by the transitivity of the relationships in the pairs above. For example, 1 is associated with 2, and 2 with 3 and so 1,2,3 are associated. Similarly 1 is associated with 7, so 1,2,3, and 7 are associated. In the above, 4, 5, and 6 are associated, but not with the remaining values. The result should look like the following:
DisjointSets = [{1,2,3,7},{4,5,6}]
Is there are simple, and efficient, way to perform this operation that I'm missing? Thank you!
Upvotes: 2
Views: 175
Reputation: 44465
The graph approach is likely faster than recursion, but for those interested in pure Python:
def get_disjoints(lst):
"""Return disjoints."""
def rec_disjoints(lst):
if not lst:
return disjoints
else:
chosen = lst[0]
# Iterat/Mutate list trick using indicies
for i, s in reversed(list(enumerate(lst[:]))):
if not chosen.isdisjoint(s):
chosen.update(s)
del lst[i]
disjoints.append(chosen)
return rec_disjoints(lst)
disjoints = []
return rec_disjoints(lst)
lst = [{1,2}, {2,3}, {4,5}, {5,6}, {1,7}]
get_disjoints(lst)
# [{1, 2, 3, 7}, {4, 5, 6}]
This takes advantage of the helpful isdisjoint
method for sets. Although, iteration + function calls + recursion will reduce performance.
Here are tests for robustness, applicable for other contributors:
import nose.tools as nt
def test_disjoint(f):
"Verify the test function generates expected disjoints."
def verify(lst1, lst2):
actual, expected = lst1, lst2
nt.eq_(actual, expected)
verify(f([{1,2}, {2,3}, {4,5}, {5,6}, {1,7}]),
[{1,2,3,7}, {4,5,6}])
verify(f([{4,5}, {5,6}, {1,7}]),
[{4,5,6}, {1,7}])
verify(f([{1,7}]),
[{1,7}])
verify(f([{1,2}, {2,3}, {4,5}, {5,6}, {1,7}, {10, 11}]),
[{1,2,3,7}, {4,5,6}, {10,11}])
verify(f([{4,5}, {5,6}, {1,7}, {10, 11}]),
[{4,5,6}, {1,7}, {10,11}])
verify(f([{1,2}, {4,5}, {6,7}]),
[{1,2}, {4,5}, {6,7}])
test_disjoint(f=get_disjoints)
Upvotes: 0
Reputation: 1718
Converting my original list to tuples:
TupleList = [(1,2),(2,3),(4,5),(5,6),(1,7)]
I used networkx via (thanks @user2357112):
import networkx as nx
G = nx.path_graph(0)
G.add_edges_from(TupleList)
DisjointSets = list(nx.connected_components(G))
Is this the most efficient way to solve the problem? Any other ideas?
Upvotes: 5