Reputation: 125
Input:
list_sets = [{1,2,3}, {4,1,7}, {5,8,9}, {4,10,11}]
expected_output = [{'id1': [1,2,3]}, {'id1': [4,1,7]}, {'id2': [5,8,9]}, {'id1': [4,10,11]}]
What I want to achieve is to assign the same id
to every set in the list that has common value with the other sets. For example 1
is a common value in the first and second set, so it should be assigned with the same id in the dictionary output. But the 4 in the second set is also common with the 4 in the fourth, this means it should be assigned with the same id
as the first and second set.
I have trouble writing the algorithm anyone has any idea or at least direction to go with?
Upvotes: 1
Views: 48
Reputation: 1361
Here is a simple fixed-point algorithm for producing your output.
# I added some more data for testing
list_sets = [{1,2,3}, {4,1,7}, {5,8,9}, {4,10,11}, {12, 13, 14}, {0, 44, 64}, {102, 99, 98}, {101, 23, 102}, {64, 23, 101}]
# 1. find all sets that intersect
# Modifying sets later on, hence copy
# Probably you could avoid some of these copies
merged = [set(s) for s in list_sets]
old_len = 0
while old_len != len(merged):
old_len = len(merged)
new_merged = []
was_merged = [False] * len(merged)
for i, group in enumerate(merged):
if was_merged[i]:
continue
was_merged[i] = True # we can skip group itself in the following loop
for j, other in enumerate(merged):
if was_merged[j]:
continue
if not group.isdisjoint(other):
group |= other
was_merged[j] = True
new_merged.append(group)
merged = new_merged
# 2. build the output from merged
# Your desired output
output = sum(([{f"id{i}": s} for s in list_sets if s <= m] for i, m in enumerate(merged)), start=[])
# output = [
# {'id0': {1, 2, 3}},
# {'id0': {1, 4, 7}},
# {'id0': {4, 10, 11}},
# {'id1': {5, 8, 9}},
# {'id2': {12, 13, 14}},
# {'id3': {0, 44, 64}},
# {'id3': {98, 99, 102}},
# {'id3': {23, 101, 102}},
# {'id3': {23, 64, 101}}
# ]
# This output makes more sense to me
output2 = {f"id{i}": [s for s in list_sets if s <= m] for i, m in enumerate(merged)}
# output2 = {
# 'id0': [{1, 2, 3}, {1, 4, 7}, {4, 10, 11}],
# 'id1': [{5, 8, 9}],
# 'id2': [{12, 13, 14}],
# 'id3': [{0, 44, 64}, {98, 99, 102}, {23, 101, 102}, {23, 64, 101}]
# }
You can probably improve on this algorithm regarding run-time complexity, but it's a first approach.
Upvotes: 1