user2474226
user2474226

Reputation: 1502

fast pairwise intersection of lists of sets

I have a list of sets a and another list of sets b.

I want to intersect each set in a against every set in b, and if the result is not empty, then report the index positions of the matched sets, and the length of the intersection.

I have the following nested for-loop, but wondered if there was a faster implementation possible.

import random as ra    
r1 = 500
r2 = 100
r3 = 5000
a = [set([ra.randrange(r1) for _ in range(ra.randrange(r2))]) for __ in range(ra.randrange(r3))]
b = [set([ra.randrange(r1) for _ in range(ra.randrange(r2))]) for __ in range(ra.randrange(r3))]

_match_detail = []

for s1 in range(len(a)):
    for s2 in range(len(b)):
        x = a[s1] & b[s2]
        lenx = len(x)
        if lenx > 0:
            _match_detail.append((s1,s2,lenx))
            

With about 4000 sets each in a and b, it takes about 20 seconds to run on my machine (an i7-8700 CPU @3.20GHz with 16GB memory). I'll eventually be dealing with hundreds of thousands of sets, so this doesn't scale nicely.

I've seen this previous question but I don't think it's applicable.

Upvotes: 1

Views: 217

Answers (0)

Related Questions