Finding n largest intersections of collection of sets in Python

Question

I have a collection (dictionary) of sets of strings, like {1: {'a', 'b'}, ...} I need to find the n largest intersections i.e. intersections of the largest subsets of the collection. The obvious brute-force approach:

for i in range(len(collection),2,-1):
    for subset in combinations(sorted(collection), i):
        intersected = set.intersection(*(collection[k] for k in subset))
        if len(intersected)>0:
            yield len(subset), intersected

is very slow. Is there some efficient way/library to do this?

gen-y-s · Accepted Answer

Just count the number of occurances of each string. The maximum number of occurances is the largest intersection of subsets (assuming a strings are unique in each subset):

coll = {1:{'a','b'}, 2:{'b','e'}, 3:{'a','c'}, 4:{'b','f'}}
print(coll)

d=dict()
for subs in coll.values():
  for s in subs:
    d[s]=d.setdefault(s, 0)+1

m=max(d.values())
print(m)

Finding n largest intersections of collection of sets in Python

Answers (2)

Related Questions