Reputation: 29
There are two lists, which respectively represent two results of the clustering algorithm, such as com1 = [[1,2,3,4], [5, 6, 7, 8], [9]], where com1 represents a clustering result, [1,2,3,4] represents nodes 1,2,3,4 belong to the same class. [5,6,7,8] indicates that nodes 5,6,7,8 belong to the same class, and 9 belongs to a separate class. com2 = [[1, 2, 4], [3], [5, 6, 7, 8], [9]]. Now, I need to find out the common parts between com1 and com2, such as [1,2,4],[5,6,7,8],[9].
Is there an efficient way to solve this problem?
Upvotes: 0
Views: 89
Reputation: 350272
Assuming that a given value can only occur in one sublist in com1
and in one sublist in com2
, we can observe the following:
Two values will belong to the same sublist in the result when they belong to the same sublist in com1
and also in the same sublist in com2
.
So we could collect for each value the two indices of the sublists they belong to: one index that identifies the sublist in com1
, and another that identifies the sublist in com2
.
We can use those pairs as keys that uniquely identify a target sublist, and populate those sublists accordinly:
from collections import defaultdict
def combine(com1, com2):
d = defaultdict(list)
for com in com1, com2:
for i, lst in enumerate(com):
for val in lst:
d[val].append(i)
res = defaultdict(list)
for val, key in d.items():
res[tuple(key)].append(val)
return list(res.values())
# Example 1
com1 = [[1,2,3,4], [5, 6, 7, 8], [9]]
com2 = [[1, 2, 4], [3], [5, 6, 7, 8], [9]]
print(combine(com1, com2)) # [[1, 2, 4], [3], [5, 6, 7, 8], [9]]
# Example 2
com1 = [[1, 2, 3], [4, 5, 6], [7, 8, 9]]
com2 = [[1, 9], [2, 3, 4], [5, 6, 7, 8], [9]]
print(combine(com1, com2)) # [[1], [2, 3], [4], [5, 6], [7, 8], [9]]
If we assume an amortised constant time complexity for dictionary get/set actions, then this brings the total time complexity to O(𝑛) where 𝑛 represents the number of values in the list that gets partitioned.
Upvotes: 2