CaponeBoom
CaponeBoom

Reputation: 29

How to efficiently find the common part between two lists

There are two lists, which respectively represent two results of the clustering algorithm, such as com1 = [[1,2,3,4], [5, 6, 7, 8], [9]], where com1 represents a clustering result, [1,2,3,4] represents nodes 1,2,3,4 belong to the same class. [5,6,7,8] indicates that nodes 5,6,7,8 belong to the same class, and 9 belongs to a separate class. com2 = [[1, 2, 4], [3], [5, 6, 7, 8], [9]]. Now, I need to find out the common parts between com1 and com2, such as [1,2,4],[5,6,7,8],[9].

Is there an efficient way to solve this problem?

Upvotes: 0

Views: 89

Answers (1)

trincot
trincot

Reputation: 350272

Assuming that a given value can only occur in one sublist in com1 and in one sublist in com2, we can observe the following:

Two values will belong to the same sublist in the result when they belong to the same sublist in com1 and also in the same sublist in com2.

So we could collect for each value the two indices of the sublists they belong to: one index that identifies the sublist in com1, and another that identifies the sublist in com2.

We can use those pairs as keys that uniquely identify a target sublist, and populate those sublists accordinly:

from collections import defaultdict

def combine(com1, com2):
    d = defaultdict(list)
    for com in com1, com2:
        for i, lst in enumerate(com):
            for val in lst:
                d[val].append(i)
    res = defaultdict(list)
    for val, key in d.items():
        res[tuple(key)].append(val)
    return list(res.values())

# Example 1
com1 = [[1,2,3,4], [5, 6, 7, 8], [9]]
com2 = [[1, 2, 4], [3], [5, 6, 7, 8], [9]]
print(combine(com1, com2))  # [[1, 2, 4], [3], [5, 6, 7, 8], [9]]

# Example 2
com1 = [[1, 2, 3], [4, 5, 6], [7, 8, 9]]
com2 = [[1, 9], [2, 3, 4], [5, 6, 7, 8], [9]]
print(combine(com1, com2))  # [[1], [2, 3], [4], [5, 6], [7, 8], [9]]

If we assume an amortised constant time complexity for dictionary get/set actions, then this brings the total time complexity to O(𝑛) where 𝑛 represents the number of values in the list that gets partitioned.

Upvotes: 2

Related Questions