Reputation: 680

Minimal subset to cover all sets

We are provided with 5 different sets:

s1 = {a,b,c}
s2 = {c,d}
s3 = {a,g,h}
s4 = {d,e,f}
s5 = {g,k,l}

The goal is to find the minimal amount of items such that each set is represented at least once. In this case, we can easily see that the idea solution is {a,d,g}. Is there a way to do this programmatically?

Edit: this is what I have so far (r is the list of sets)

for i in r:
    i.sort()

r.sort(reverse=True)

arr = []

arr.append(r[0][0])

def isInArr(k):
    for j in k:
        if j in arr:
            return False
    return True

for i in r[1:]:
    if isInArr(i):
        arr.append(i[0])

Edit 2:

This code combines the answer of Connor and bruteforces the (to my knowledge) optimal solution.

def isnotarr(k,rs):
    for j in k:
        if j in rs:
            return True
    return False

def most_frequent(List):
   return max(set(List), key = List.count)

##s1 = set(["a", "b", "c"])
##s2 = set(["c", "d"])
##s3 = set(["a", "g", "h"])
##s4 = set(["d", "e", "f"])
##s5 = set(["g", "k", "l"])
set_list = [set(i) for i in r]

return_set = []
while len(set_list) > 0:
   elements = []
   for s in set_list:
       for el in s:
           elements.append(el)
   element = most_frequent(elements)
   return_set.append(element)
   new_set_list = []
   for s in set_list:
       if element not in s:
           new_set_list.append(s)
   set_list = new_set_list

print "================initial set found============\n"
print(return_set)
print "================initial set found============\n"



def isvalidcomb(cm):
    for el in r:
        if isnotarr(el,cm):
            pass
        else:
            return False
    return True

def bfopt(n):
    combs = itertools.combinations(return_set,n)
    for i in combs:
        if isvalidcomb(i):
            return i
    return None

for i in range(len(return_set),0,-1):
    print "===========computing sets for maxlen %d============\n"%(i)
    tmp = bfopt(i)
    if tmp is not None:
        print tmp

Upvotes: 0

Answers (5)

S. L.

Reputation: 680

There is another way to go about this, as I just learned. Essentially, each element is a bool variable and each set is a collection of OR constraints. Each set has to return true with the minimum number of elements set to true. This turns out to be very easily solvable by a linear solver like z3. Simply set the sum of sums of true in each set as the variable to be minimized and voila.

Upvotes: 0

LarrySnyder610

Reputation: 2397

This is precisely the set cover problem, a classic discrete optimization problem. It's NP-hard but there are many good algorithms for it, both exact and approximate.

Upvotes: 2

blhsing

Reputation: 106488

You can use itertools.combinations to pick an increasing number of items out of all distinct items and check if the picked set of items has at least one item in every set in the list:

from itertools import count, combinations
r = [{'a', 'b', 'c'},
     {'c', 'd'},
     {'a', 'g', 'h'},
     {'d', 'e', 'f'},
     {'g', 'k', 'l'}]
all_items = set.union(*r)
print(next(set(picks) for size in count(1)
    for picks in combinations(all_items, size)
    if all(any(i in s for i in picks) for s in r)))

This outputs: (Your sample input has more than one optimal solution and this outputs only one of them.)

{'c', 'd', 'g'}

If you want all optimal solutions, you can use itertools.groupby over the generator expression above with len as the key function:

from itertools import groupby
list(next(groupby((set(picks) for size in count(1)
    for picks in combinations(all_items, size)
    if all(any(i in s for i in picks) for s in r)), len))[1])

This returns:

[{'f', 'c', 'g'},
 {'e', 'c', 'g'},
 {'a', 'd', 'g'},
 {'b', 'd', 'g'},
 {'c', 'd', 'g'},
 {'a', 'd', 'l'},
 {'a', 'd', 'k'}]

Upvotes: 0

Daniel

Reputation: 7724

First: each set can be represented by a power of 2: si = 2^(i-1).

Each letter can be considered an item of weight 1 that has a certain value.

The value of a letter can be evaluated as the sum of the sets it represents.

e.g.: a represents s1 and s3, so value[a] = 2^(1-1) + 2^(3-1) = 3.

Now your goals is to find the amount of itens with minimum weight such that the sum of its values is (1 + 2 + 4 + 8 + 16) = 31. This is basically the knapsack problem, a well known dynamic programming problem. Each item would be a letter and your knapsack has size 5 (at most). You need to get a value of 31 within this size.

As for the value of each letter, you can just preprocess.

Upvotes: 2

Connor

Reputation: 675

This is how I did it.

def most_frequent(List):
   return max(set(List), key = List.count)

s1 = set(["a", "b", "c"])
s2 = set(["c", "d"])
s3 = set(["a", "g", "h"])
s4 = set(["d", "e", "f"])
s5 = set(["g", "k", "l"])
set_list = [s1, s2, s3, s4, s5]

return_set = []
while len(set_list) > 0:
   elements = []
   for s in set_list:
       for el in s:
           elements.append(el)
   element = most_frequent(elements)
   return_set.append(element)
   new_set_list = []
   for s in set_list:
       if element not in s:
           new_set_list.append(s)
   set_list = new_set_list
print(return_set)

Upvotes: 2

Minimal subset to cover all sets

Answers (5)

Related Questions