Markov Unchained
Markov Unchained

Reputation: 159

Calculation of combinations/cartesian product of sets (without duplicates and without order restrictions)

I have a combinatorial problem that can be solved inefficiently using the cartesian product of multiple sets. Concretely, I have multiple items and multiple elements that satisfy each item. The problem consists of finding all possible combinations of elements that satisfy all items. For example:

items -> elements
------------------------
1 -> {a,b}     // a and b cover the item 1
2 -> {a,b}     // a and b cover the item 2
3 -> {a,b,c}   // a, b and c cover the item 3
4 -> {a,b,e,f} // a, b, e, f cover the item 4

Alternative representation:
element -> items covered
------------------------
a -> {1,2,3,4}
b -> {1,2,3,4}
c -> {3}
e -> {4}
f -> {4}

The goal is to find all combinations that cover items 1,2,3,4. Valid solutions are:

{a},{a,b},{a,c},{a,e},{a,f},{a,b,c},{a,b,e},{a,b,f},{a,b,c,e},{a,b,c,f}
{b},{b,c},{b,e},{b,f},{b,c,e},{b,c,f}

Note that the order is not important, so {a,b} = {b,a} ({a,b} x {c,d} = {c,d} x {a,b}). Also, note that {a,a,a,a}, {a,a,a,b}... are redundant combinations.

As you can see, this problem is similar to the set cover problem, where the universe of elements for this example are the items U={1,2,3,4} and the set of subsets from U is S={ab={1,2,3,4},c={3},ef{4}}, where set {1,2,3,4} is the set of items covered by the element a and b, {3} is the set of elements covered by c, and {4} is the set of elements covered by elements e and f. However, the goal here is not finding the minimal combination of sets from S that covers all elements from U, but finding all combinations of elements {a,b,c,e,f} that cover all items {1,2,3,4}.

A näive implementation could be done by performing a cartesian product between sets for 1,2,3 and 4, and then filtering the combinations that are redundant. However, this approach is very inefficient. Suppose I have this situation:

1 -> {a,b,c,d,e,f,g,h}
2 -> {a,b,c,d,e,f,g,h}
3 -> {a,b,c,d,e,f,g,h}
4 -> {a,b,c,d,e,f,g,h}
5 -> {a,b,c,d,e,f,g,h}
6 -> {a,b,c,d,e,f,g,h,i}

A cartesian product between the six sets will result in a 8^5*9=294912 combinations, when there are actually many fewer combinations, which are: {a,b,c,d,e,f,g} U {a,b,c,d,e,f,g} x {i}.

Another way to solve this problem is to enumerate all elements, skipping the combinations that are equivalent to other previously generated, and also skipping repeated elements. This is kinda easy to compute and can be implemented as an iterator that returns a combination at a time, but I don't know if there is a better way to solve this problem, or if this problem was studied before.

How would you solve this problem?

Upvotes: 3

Views: 1678

Answers (3)

smichr
smichr

Reputation: 19047

As I understand it, this is an NP-hard problem. So the main thing is to reduce the size of the problem before starting the hard work. In the case where there are lots of repeats, ['ab','ab','ab','abc'] get rid of the repeats to give ['ab','abc']. At worst, there are no shared terms and you will be generating all the cartesian products of the sets.

If you only wanted the unique subsets of the sets you indicated -- the minimal spanning sets, not the minimum spanning sets -- then this would be straightforward to generate; at worst this is generating the cartesian product of the sets, but when there are a significant number of shared elements between sets it can be done much more efficiently with the cnf_dnf routine described here. For example,

>>> sets = [
 {1, 2, 4, 6, 7, 10},
 {2, 3, 4, 6, 7, 10},
 {2, 7, 8, 9, 10},
 {1, 2, 3, 4, 5, 7, 10},
 {1, 2, 3, 5, 8, 9},
 {1, 2, 3, 5, 7, 10},
 {1, 2, 4, 6, 7},
 {2, 3, 4, 5, 6, 8, 9, 10},
 {1, 2, 4, 7, 8, 9},
 {1, 2, 3, 4, 5, 7, 8, 10}]
>>> from time import time
>>> from itertools import product

# there are many possible products

>>> t=time();sum(1 for i in product(*sets)),time()-t
(87091200, 4.305924654006958)

# there are very few minimal spanning sets

>>> t=time();sum(1 for i in cnf_dnf(sets)),time()-t
(28, 0.00010538101196289062)

Even if you wanted all the unique order-doesn't-matter full-length tuples obtained by picking one item from each set, there is an efficient way to do that and to generate them in lexicographical order using the combogen routine discussed here. That generates significantly fewer subsets:

>>> sum(1 for i in combogen(sets, len(range(sets))))
81399

But you want the minimal sets and the redundant entries which do not create repeats of what you already have. This is a more difficult problem and I am not sure that Gilad's suggestions are really going to make it that much easier. It is easy to generate the minimal spanning sets but then to figure out how to combine them without making repeats and enforcing the condition that only one item is taken from each set.

It seems to me that the best you can do is to filter the results of the unique tuples generated by combogen; for the sets defined above this take just over a second. For the given sets it does the heavy lifting, reducing the naive product of 87 million possibilities to 81 thousand; sorting the set values of the result and keeping the unique ones leaves you with less than 1000:

>>> sum(1 for i in {tuple(sorted(set(i))) for i in combogen(sets, range(len(sets)))})
934

If you have a "small" group of sets, you might just start with the following rsets which is not too bad for 7 sets of 7 unique values in each or 20 sets of 2 unique values in each; it computes the redundant sets which span the given sets:

from itertools import product
rsets = lambda L: {tuple(sorted(set(i))) for i in product(*{tuple(sorted(i)) for i in L})}
>>> ' '.join(sorted([''.join(i) for i in rsets(['ab','bc','abc','aef'])]))
'ab abc abce abcf abe abf ac ace acf bce bcf be bf'

Upvotes: 0

גלעד ברקן
גלעד ברקן

Reputation: 23955

What if you did this:

1 -> {a,b}
2 -> {b,c}
3 -> {a,b,c}
4 -> {a,e,f}

=>

a -> [1,3,4]
b -> [1,2,3]
c -> [2,3]
e -> [4]
f -> [4]

Then enumerate the combinations of the left side that provide (at least) [1,2,3,4]

For each item in the set of all-satisfying sets, enumerate combinations 
with other items.

All-Satisfying-Sets: {{a,b},{b,e},{b,f}}
Combinations within All-Satisfiying-Sets: {{a,b,e},{a,b,f},{b,e,f},{a,b,e,f}}
Others: {c}
Combinations with Others: {{a,b,c},{b,e,c},{b,f,c}
                          ,{a,b,e,c},{a,b,f,c},{b,e,f,c},{a,b,e,f,c}}

Or you could do this in Haskell:

import Data.List (union, subsequences, sort)

example1 = [(["a"],[1,2,3,4])
           ,(["b"],[1,2,3,4])
           ,(["c"],[3])
           ,(["e"],[4])
           ,(["f"],[4])]

example2 = [(["a"],[1,2,3,4,5,6])
           ,(["b"],[1,2,3,4,5,6])
           ,(["c"],[1,2,3,4,5,6])
           ,(["e"],[1,2,3,4,5,6])
           ,(["f"],[1,2,3,4,5,6])
           ,(["g"],[1,2,3,4,5,6])
           ,(["h"],[1,2,3,4,5,6])
           ,(["i"],[6])]

combs items list = 
  let unify (a,b) (a',b') = (sort (a ++ a'), sort (union b b')) 
  in map fst
     . filter ((==items) . snd) 
     . map (foldr unify ([],[])) 
     . subsequences 
     $ list


OUTPUT:

*Main> combs [1..4] example1
[["a"],["b"],["a","b"],["a","c"],["b","c"],["a","b","c"],["a","e"],["b","e"],
["a","b","e"],["a","c","e"],["b","c","e"],["a","b","c","e"],["a","f"],["b","f"], 
["a","b","f"],["a","c","f"],["b","c","f"],["a","b","c","f"],["a","e","f"],
["b","e","f"],["a","b","e","f"],["a","c","e","f"],["b","c","e","f"],
["a","b","c","e","f"]]

Upvotes: 1

Agentlien
Agentlien

Reputation: 5116

First, realize that if a set of elements does not satisfy all items, neither does any of its subsets.

Second, realize that if a set satisfies all items, so do all its supersets.

Now, all you have to do is:

Let S be the set of all elements. Let R be the empty set.

Define a function find( s, r ) which does:

If r includes s, return r.
If s does not satisfy all items, return r.

Otherwise add s to r.
For every item I in  s,
     let s' be s-I
     let s be f(s', r)

return s.

Just call find(S,R) and you have your answer.

This method performs some duplicate tests, but always kills a branch whenever it is identified as such. This leads to a lot of pruning on a large set of elements.

Both lookup of whether r includes a particular set of elements and the check if s satisfies all items can be made very fast at the expense of extra memory.

Upvotes: 2

Related Questions