maximizing number of sets that form an intersection

Question

Let's say I have 10 sets, and each set has some random amount of positive integers.

Now, I want to maximize the number of sets that can form an intersection (NOT search for the largest intersection), given the constraint that a set can only form an intersection with another set if at least 'x' integers overlap. For instance, if x = 2, then [1,2,3,4] cannot form an intersection with [1,5,6,7] as there is only 1 integer that overlaps, not 2.

Another thing to note (although obvious) is, for x=2, if I have [1,2,3,4] and [1,2,6,7], the intersection can happen and for a third set to form an intersection, it MUST have [1,2] somewhere in the set.

I'm not able to properly form an algorithm to do this, because there exponential ways that the sets can be compared! Even if I only had 3 sets, given the size of the set, I'd have to consider every subset combination comparison given the intersection constraint.

I am generating my sets as follows:

sets = []
for i in range(0,10):
    temp = np.random.randint(1,3000)
    sets.append(set(np.random.randint(1, 3000, temp)))

Kacperito · Accepted Answer

You could count the number of occurrences of each number in the sets. Since you are trying to maximise the total number of intersections, the most common numbers will result in the biggest number of possible intersections (with the ideal scenario where every 2 sets form an intersection). Here is the code for that:

import numpy as np

# Set a seed for testing purposes
np.random.seed(1)

# Initialise the min number of elements in an intersection
x = 2

# Initialise the list of sets
sets = list()

# Initialise the count mapping
count = dict()

# Generate the sets
for i in range(0,10):
    temp = np.random.randint(1,3000)
    sets.append(set(np.random.randint(1, 3000, temp)))

# Count each number's occurrence
for s in sets:
    for number in s:
        if number in count:
            count[number] +=1
        else:
            count[number] = 1

# Sort the result (by the count number)
l = sorted(count, key=lambda x: count[x], reverse=True)

# Print the number of occurrences (within the boundary of min x elements)
print(count[l[x-1]])

# Print the numbers that give you the maximum number of intersections
print(l[:x])

The results are:

7
[2270, 2225]

In this scenario, 7 out of 10 sets contain the numbers 2270 and 2225, and therefore can form a total of 21 (6*7/2) intersections. The performance should be O(NlogN) (due to sorting) where N is the total amount of numbers.

maximizing number of sets that form an intersection

Answers (2)

Related Questions