Reputation: 1502
I have a list of sets a
and another list of sets b
.
I want to intersect each set in a
against every set in b
, and if the result is not empty, then report the index positions of the matched sets, and the length of the intersection.
I have the following nested for-loop, but wondered if there was a faster implementation possible.
import random as ra
r1 = 500
r2 = 100
r3 = 5000
a = [set([ra.randrange(r1) for _ in range(ra.randrange(r2))]) for __ in range(ra.randrange(r3))]
b = [set([ra.randrange(r1) for _ in range(ra.randrange(r2))]) for __ in range(ra.randrange(r3))]
_match_detail = []
for s1 in range(len(a)):
for s2 in range(len(b)):
x = a[s1] & b[s2]
lenx = len(x)
if lenx > 0:
_match_detail.append((s1,s2,lenx))
With about 4000 sets each in a
and b
, it takes about 20 seconds to run on my machine (an i7-8700 CPU @3.20GHz with 16GB memory). I'll eventually be dealing with hundreds of thousands of sets, so this doesn't scale nicely.
I've seen this previous question but I don't think it's applicable.
Upvotes: 1
Views: 217