Reputation: 1239
I tried finding the answers of few selected answers, but they doesn't seem working. I have to merge list of lists.
[[283, 311], [311, 283, 316], [150, 68], [86, 119], [259, 263], [263, 259, 267], [118, 87], [262, 264], [264, 262], [85, 115], [115, 85], [244, 89], [140, 76, 82, 99], [236, 111], [330, 168], [76, 63, 107, 124, 128, 135, 140], [131, 254], [254, 131], [21, 38], [38, 21], [220, 291], [291, 220], [296, 46], [64, 53, 57, 61, 63, 65, 66, 76, 96, 100, 103, 114, 123, 127, 128, 130, 148, 149], [274, 240], [157, 225, 234], [225, 247], [233, 44], [89, 244], [80, 101], [210, 214], [78, 155], [55, 139], [102, 74, 75, 132], [105, 252], [149, 55, 59, 63, 71, 73, 81, 100, 102, 116, 122, 138, 146], [97, 231], [231, 97], [155, 78], [239, 305], [305, 239], [145, 94, 248], [147, 150], [61, 64], [152, 219], [219, 152], [246, 250], [252, 105], [223, 235], [235, 223], [237, 60, 344], [344, 237], [182, 129], [331, 117], [12, 2, 8, 10, 13, 15], [250, 246]]
All I want is, for example, [283,311]
exists in [311,283,316]
. then both should be merged making only one list in the above list. I need to merge a list if it exists in some other.
Please note, I can do it using loop inside loop, but looking for a pythonic way to achieve this. Also, if you know how to merge lists sharing atleaast one common element, please share.
Edit :
Please don't forget, looking for a pythonic approach. I think it is not possible using any core Pythonic approach. Then what should be next possible solution using loops. I need efficiency as have to merge list of lists with more than a million items every half an hour. I did using for loop inside for loop then comparing two items, but it is taking too much of time.
Upvotes: 2
Views: 294
Reputation: 363737
Here's an alternative approach that I think should be faster than @mgilson's if you have a large number of small sets:
from collections import defaultdict
sets = set(map(frozenset, lists))
def remove_subsets(sets):
# map each element to the sets in which it occurs
sets_containing = defaultdict(set)
for s in sets:
for x in s:
sets_containing[x].add(s)
for s in sets:
supersets = set.intersection(*(sets_containing[x] for x in s))
if len(supersets) == 1:
yield s
The difference is mainly in the final loop, which runs not through all n(n-1)/2 distinct pairs of sets, but only though the n sets in the outer loop, then through a set of candidate supersets that contain some element of the set under consideration. It can be optimized further by stopping the reduce
early when it produces an empty set.
Upvotes: 3
Reputation: 310049
I think this works ...
a = [[283, 311], [311, 283, 316], [150, 68], [86, 119], [259, 263], [263, 259, 267], [118, 87], [262, 264], [264, 262], [85, 115], [115, 85], [244, 89], [140, 76, 82, 99], [236, 111], [330, 168], [76, 63, 107, 124, 128, 135, 140], [131, 254], [254, 131], [21, 38], [38, 21], [220, 291], [291, 220], [296, 46], [64, 53, 57, 61, 63, 65, 66, 76, 96, 100, 103, 114, 123, 127, 128, 130, 148, 149], [274, 240], [157, 225, 234], [225, 247], [233, 44], [89, 244], [80, 101], [210, 214], [78, 155], [55, 139], [102, 74, 75, 132], [105, 252], [149, 55, 59, 63, 71, 73, 81, 100, 102, 116, 122, 138, 146], [97, 231], [231, 97], [155, 78], [239, 305], [305, 239], [145, 94, 248], [147, 150], [61, 64], [152, 219], [219, 152], [246, 250], [252, 105], [223, 235], [235, 223], [237, 60, 344], [344, 237], [182, 129], [331, 117], [12, 2, 8, 10, 13, 15], [250, 246]]
aa = set(map(frozenset,a)) #eliminate duplicates
result = [x for x in aa if not any(x < y for y in aa)]
Note that this gives you a list of frozenset
objects, but you can easily turn that back into a list of lists:
result = [list(fset) for fset in result]
Unfortunately, it has quadratic complexity ... I'm not sure if there's anything to really be done about that ... (keeping it as lists would result in much worse complexity FWIW)
Here's a second approach which is a little better algorithmically:
from itertools import takewhile
def foo2(aa):
aa = sorted(set(map(frozenset,a)),key=len)
def conditional(x,aa):
xlen = len(x)
return any(x < y for y in takewhile(lambda z: xlen <= len(z),aa))
return [x for x in aa if not conditional(x,aa)]
Of course, It would need to be timed on you real data to see if it actually performs better (for the test data, the lists aren't big enough and the overhead of generating the conditional
function and the takewhile
instance and the lambda
make it not worthwhile.
for the curious, here's my benchmark:
a = [[283, 311], [311, 283, 316], [150, 68], [86, 119], [259, 263], [263, 259, 267], [118, 87], [262, 264], [264, 262], [85, 115], [115, 85], [244, 89], [140, 76, 82, 99], [236, 111], [330, 168], [76, 63, 107, 124, 128, 135, 140], [131, 254], [254, 131], [21, 38], [38, 21], [220, 291], [291, 220], [296, 46], [64, 53, 57, 61, 63, 65, 66, 76, 96, 100, 103, 114, 123, 127, 128, 130, 148, 149], [274, 240], [157, 225, 234], [225, 247], [233, 44], [89, 244], [80, 101], [210, 214], [78, 155], [55, 139], [102, 74, 75, 132], [105, 252], [149, 55, 59, 63, 71, 73, 81, 100, 102, 116, 122, 138, 146], [97, 231], [231, 97], [155, 78], [239, 305], [305, 239], [145, 94, 248], [147, 150], [61, 64], [152, 219], [219, 152], [246, 250], [252, 105], [223, 235], [235, 223], [237, 60, 344], [344, 237], [182, 129], [331, 117], [12, 2, 8, 10, 13, 15], [250, 246]]
def foo1(aa):
aa = set(map(frozenset,a)) #eliminate duplicates
return [x for x in aa if not any(x < y for y in aa)]
from itertools import takewhile
def conditional(x,aa,takewhile=takewhile):
xlen = len(x)
return any(x < y for y in takewhile(lambda z: xlen <= len(z),aa))
def foo2(aa):
aa = sorted(set(map(frozenset,a)),key=len)
return [x for x in aa if not conditional(x,aa)]
print set(foo1(a)) == set(foo2(a))
import timeit
N = 10000
print timeit.timeit('foo1(a)','from __main__ import foo1,a',number=N)
print timeit.timeit('foo2(a)','from __main__ import foo2,a',number=N)
Upvotes: 3
Reputation: 123732
To answer your latter question; it's a connected-components problem.
>>> import networkx as nx
>>> G = nx.Graph()
>>> for component in l:
...: G.add_edges_from(pairs(component))
...:
>>> nx.connected_components(G)
# the answer
Upvotes: 8