Reputation: 345
This is a problem I can't currently find on Leetcode or StackOverflow. Lets say you have a set of lists of numbers or references as so:
[[1,2,3],[3,4,5],[6,7,8],[8,9],[10],[7]]
What is the fastest algorithm to merge these lists such that the output would be:
[[1,2,3,4,5],[6,7,8,9],[10]]
Many thanks.
Upvotes: 3
Views: 295
Reputation: 77474
This is actually not a clustering problem, but a set union problem.
By the names "union find" or "disjoint-set" you can find some well discussed approaches to make these things fast.
Upvotes: 0
Reputation: 4772
RosettaCode has a task set consolidation that has a Python example that can be modified to work with lists:
def consolidate(lists):
setlist = [set(lst) for lst in lists if lst]
for i, s1 in enumerate(setlist):
if s1:
for s2 in setlist[i+1:]:
intersection = s1.intersection(s2)
if intersection:
s2.update(s1)
s1.clear()
s1 = s2
return sorted(sorted(s) for s in setlist if s)
print(consolidate([[1,2,3],[3,4,5],[6,7,8],[8,9],[10],[7]]))
Output:
[[1, 2, 3, 4, 5], [6, 7, 8, 9], [10]]
Upvotes: 0
Reputation: 65498
Prepare a graph from the lists as follows and find the connected components using depth-first search.
Each list gives rise to undirected edges that connected the first element to the others, e.g.,
[1,2,3] -> [(1,2), (1,3)]
[3,4,5] -> [(3,4), (3,5)]
[6,7,8] -> [(6,7), (6,8)]
[8,9] -> [(8,9)]
[10] -> []
[7] -> []
Then run depth-first search to find connected components. In Python, everything goes something like this.
import collections
def merge(lsts):
neighbors = collections.defaultdict(set)
for lst in lsts:
if not lst:
continue
for x in lst:
neighbors[x].add(lst[0])
neighbors[lst[0]].add(x)
visited = set()
for v in neighbors:
if v in visited:
continue
stack = [v]
component = []
while stack:
w = stack.pop()
if w in visited:
continue
visited.add(w)
component.append(w)
stack.extend(neighbors[w])
yield component
Upvotes: 3