Reputation: 680
We are provided with 5 different sets:
s1 = {a,b,c}
s2 = {c,d}
s3 = {a,g,h}
s4 = {d,e,f}
s5 = {g,k,l}
The goal is to find the minimal amount of items such that each set is represented at least once. In this case, we can easily see that the idea solution is {a,d,g}
. Is there a way to do this programmatically?
Edit: this is what I have so far (r
is the list of sets)
for i in r:
i.sort()
r.sort(reverse=True)
arr = []
arr.append(r[0][0])
def isInArr(k):
for j in k:
if j in arr:
return False
return True
for i in r[1:]:
if isInArr(i):
arr.append(i[0])
Edit 2:
This code combines the answer of Connor and bruteforces the (to my knowledge) optimal solution.
def isnotarr(k,rs):
for j in k:
if j in rs:
return True
return False
def most_frequent(List):
return max(set(List), key = List.count)
##s1 = set(["a", "b", "c"])
##s2 = set(["c", "d"])
##s3 = set(["a", "g", "h"])
##s4 = set(["d", "e", "f"])
##s5 = set(["g", "k", "l"])
set_list = [set(i) for i in r]
return_set = []
while len(set_list) > 0:
elements = []
for s in set_list:
for el in s:
elements.append(el)
element = most_frequent(elements)
return_set.append(element)
new_set_list = []
for s in set_list:
if element not in s:
new_set_list.append(s)
set_list = new_set_list
print "================initial set found============\n"
print(return_set)
print "================initial set found============\n"
def isvalidcomb(cm):
for el in r:
if isnotarr(el,cm):
pass
else:
return False
return True
def bfopt(n):
combs = itertools.combinations(return_set,n)
for i in combs:
if isvalidcomb(i):
return i
return None
for i in range(len(return_set),0,-1):
print "===========computing sets for maxlen %d============\n"%(i)
tmp = bfopt(i)
if tmp is not None:
print tmp
Upvotes: 0
Views: 1094
Reputation: 680
There is another way to go about this, as I just learned. Essentially, each element is a bool
variable and each set is a collection of OR
constraints. Each set has to return true
with the minimum number of elements set to true
. This turns out to be very easily solvable by a linear solver like z3. Simply set the sum of sums of true
in each set as the variable to be minimized and voila.
Upvotes: 0
Reputation: 2397
This is precisely the set cover problem, a classic discrete optimization problem. It's NP-hard but there are many good algorithms for it, both exact and approximate.
Upvotes: 2
Reputation: 106488
You can use itertools.combinations
to pick an increasing number of items out of all distinct items and check if the picked set of items has at least one item in every set in the list:
from itertools import count, combinations
r = [{'a', 'b', 'c'},
{'c', 'd'},
{'a', 'g', 'h'},
{'d', 'e', 'f'},
{'g', 'k', 'l'}]
all_items = set.union(*r)
print(next(set(picks) for size in count(1)
for picks in combinations(all_items, size)
if all(any(i in s for i in picks) for s in r)))
This outputs: (Your sample input has more than one optimal solution and this outputs only one of them.)
{'c', 'd', 'g'}
If you want all optimal solutions, you can use itertools.groupby
over the generator expression above with len
as the key function:
from itertools import groupby
list(next(groupby((set(picks) for size in count(1)
for picks in combinations(all_items, size)
if all(any(i in s for i in picks) for s in r)), len))[1])
This returns:
[{'f', 'c', 'g'},
{'e', 'c', 'g'},
{'a', 'd', 'g'},
{'b', 'd', 'g'},
{'c', 'd', 'g'},
{'a', 'd', 'l'},
{'a', 'd', 'k'}]
Upvotes: 0
Reputation: 7724
First: each set can be represented by a power of 2: si = 2^(i-1)
.
Each letter can be considered an item of weight 1 that has a certain value.
The value of a letter can be evaluated as the sum of the sets it represents.
e.g.: a represents s1 and s3, so value[a] = 2^(1-1) + 2^(3-1) = 3.
Now your goals is to find the amount of itens with minimum weight such that the sum of its values is (1 + 2 + 4 + 8 + 16) = 31. This is basically the knapsack problem, a well known dynamic programming problem. Each item would be a letter and your knapsack has size 5 (at most). You need to get a value of 31 within this size.
As for the value of each letter, you can just preprocess.
Upvotes: 2
Reputation: 675
This is how I did it.
def most_frequent(List):
return max(set(List), key = List.count)
s1 = set(["a", "b", "c"])
s2 = set(["c", "d"])
s3 = set(["a", "g", "h"])
s4 = set(["d", "e", "f"])
s5 = set(["g", "k", "l"])
set_list = [s1, s2, s3, s4, s5]
return_set = []
while len(set_list) > 0:
elements = []
for s in set_list:
for el in s:
elements.append(el)
element = most_frequent(elements)
return_set.append(element)
new_set_list = []
for s in set_list:
if element not in s:
new_set_list.append(s)
set_list = new_set_list
print(return_set)
Upvotes: 2