Synectome
Synectome

Reputation: 96

Merging sublists based on content overlap, python 3

If we have a number of lists that contain sublists(pairs) with integers in them. If any two sublists have the same number, we want to merge them and erase the duplicate digit.

for example: alist = [[2, 1], [5, 3], [5, 1], [3, 4], [3, 1], [5, 4], [4, 1], [5, 2], [4, 2]] becomes alist = [1,2,3,4,5] And the result would be all of them merge to one list because they happen to be all share digits in common.

but not all lists will be so convenient: alist = [[4,5],[7,8],[6,7],[9,5]] would become: alist = [[4,5,9],[6,7,8]] The issue is I'm iterating over huge lists, with 10^7 entries in them. Is there a method to complete this without two nested loops? that is what I am doing currently.

Upvotes: 1

Views: 181

Answers (1)

mcdowella
mcdowella

Reputation: 19601

First of all, you can recognise which pairs share a member by creating a hash table indexed by number whose entries are sets of pairs.

You can now view the problem as finding the connected components (https://en.wikipedia.org/wiki/Connected_component_%28graph_theory%29) in a graph where the nodes are pairs and two pairs are connected if they share a number. As the Wikipedia entry notes, you can do this by using depth first search to visit all of the nodes in the graph, using the depth first search to follow all of the direct and indirect links from a node to all of the other nodes in the same component. Another approach would be to use https://en.wikipedia.org/wiki/Disjoint-set_data_structure to maintain sets of pairs, merging sets when two pairs from previously different sets share a number.

Upvotes: 1

Related Questions