Y.C.T
Y.C.T

Reputation: 125

Union sets that are linked with a common value

Input:

list_sets = [{1,2,3}, {4,1,7}, {5,8,9}, {4,10,11}]
expected_output = [{'id1': [1,2,3]}, {'id1': [4,1,7]}, {'id2': [5,8,9]}, {'id1': [4,10,11]}]

What I want to achieve is to assign the same id to every set in the list that has common value with the other sets. For example 1 is a common value in the first and second set, so it should be assigned with the same id in the dictionary output. But the 4 in the second set is also common with the 4 in the fourth, this means it should be assigned with the same id as the first and second set.

I have trouble writing the algorithm anyone has any idea or at least direction to go with?

Upvotes: 1

Views: 48

Answers (1)

cherrywoods
cherrywoods

Reputation: 1361

Here is a simple fixed-point algorithm for producing your output.

# I added some more data for testing
list_sets = [{1,2,3}, {4,1,7}, {5,8,9}, {4,10,11}, {12, 13, 14}, {0, 44, 64}, {102, 99, 98}, {101, 23, 102}, {64, 23, 101}]

# 1. find all sets that intersect

# Modifying sets later on, hence copy
# Probably you could avoid some of these copies
merged = [set(s) for s in list_sets]
old_len = 0
while old_len != len(merged):
    old_len = len(merged)
    new_merged = []
    was_merged = [False] * len(merged)

    for i, group in enumerate(merged):
        if was_merged[i]:
            continue
        
        was_merged[i] = True  # we can skip group itself in the following loop
        for j, other in enumerate(merged):
            if was_merged[j]:
                continue
            
            if not group.isdisjoint(other):
                group |= other
                was_merged[j] = True
        new_merged.append(group)
    merged = new_merged

# 2. build the output from merged

# Your desired output
output = sum(([{f"id{i}": s} for s in list_sets if s <= m] for i, m in enumerate(merged)), start=[])
# output = [
#     {'id0': {1, 2, 3}},
#     {'id0': {1, 4, 7}},
#     {'id0': {4, 10, 11}},
#     {'id1': {5, 8, 9}},
#     {'id2': {12, 13, 14}},
#     {'id3': {0, 44, 64}},
#     {'id3': {98, 99, 102}},
#     {'id3': {23, 101, 102}},
#     {'id3': {23, 64, 101}}
# ]

# This output makes more sense to me
output2 = {f"id{i}": [s for s in list_sets if s <= m] for i, m in enumerate(merged)}
# output2 = {
#     'id0': [{1, 2, 3}, {1, 4, 7}, {4, 10, 11}],
#     'id1': [{5, 8, 9}],
#     'id2': [{12, 13, 14}],
#     'id3': [{0, 44, 64}, {98, 99, 102}, {23, 101, 102}, {23, 64, 101}]
# }

You can probably improve on this algorithm regarding run-time complexity, but it's a first approach.

Upvotes: 1

Related Questions