rutchkiwi
rutchkiwi

Reputation: 556

Multiset of sets - choosing the elements which cover the most sets

Given a multiset of sets M, for example:

M = ({a}, {a}, {b}, {c}, {c}, {c}, {a,b}, {a,b}, {a,c}, {a,b,c}, {d, e})

I want to pick the set of elements S of size K for which the maximum number of elements of M are are subsets of S. Note that this set S does not have to be one of the sets in M - it could be any elements from the union of all sets in M. (So {a,d} would be a valid candidate)

So for K=1 I'd pick S = {c} as that covers 3 sets in M: ({c}, {c}, {c})

For K=2 I'd pick S = {a,c} as that covers 6 sets in M. ({a}, {a}, {c}, {c}, {c}, {a,c})

Does this problem have a name? And how can I solve it in an efficient manner?

Practical application: The practical problem I'm trying to solve is related to packing orders in a warehouse. I want to choose which products a packer should pick so that they can pack the most orders while minimizing the number of different products they need to pick.

Upvotes: 2

Views: 411

Answers (3)

Need-4-Steve
Need-4-Steve

Reputation: 77

I just asked a very similar question, and wasn't really able to find a great answer, but I was able to figure this out eventually on my own. I was hoping if I shared this with you, you could provide any insight you might have found. This doesn't really use any 3rd party dependencies, but it does require the "DS" extension for PHP to use Sets. You could just use arrays though, sets aren't really required, you would just have to write some functions for the set functions, ie set->contains(...), etc., if you're using PHP. Python probably already has these data structures although I'm not entirely familiar so I can't say for sure. Anyways, here's the code, I hope this helps, it could definitely use some optimization, but it finds the correct answers and matches the solver answers from the python code provided above by ldog:

Okay, so I figured out an answer, although it may not be the most optimized solution ever. If anyone can help me get this running faster, or perhaps help me write this in such a way that it is able to find the optimal top 3, I would appreciate it. Here's my code, note that I am using the data structure "Set", available from https://github.com/php-ds

class ProductionOptimizer{
    private int $CARDINALITY=6;
    private CombinationsIterator $combinationsIterator;
    private array $tickets;
    private Set $threads;

    public function __construct(array $array){
        $this->tickets=$array;
        $start = microtime(TRUE);
        $this->threads=new Set();
        foreach ($array as $ticketThreads){
            foreach ($ticketThreads as $thread)
                $this->threads->add($thread);
        }
        $end = microtime(TRUE);
        // print_r($this->threads);
        echo PHP_EOL."Creating the Set took " . ($end - $start) . " seconds to complete.";
        $this->combinationsIterator=new CombinationsIterator($this->getThreadsAsArray(),6);
    }

    public function outputThreads() : void {
        print_r($this->threads);
    }

    public function getThreadsAsArray() : array {
        return $this->threads->toArray();
    }

    public function getCombinationsIterator() : CombinationsIterator{
        return $this->combinationsIterator;
    }

    public function createNewCombinationsIterator(array $array, int $subsetCardinality=null) : CombinationsIterator {
        if(is_null($subsetCardinality)) $subsetCardinality=$this->CARDINALITY;
        $this->combinationsIterator=new CombinationsIterator($array, $subsetCardinality);
        return $this->combinationsIterator;
    }

    public function removeFromSet(array $array){
        // var_dump($this->threads);
        echo "Removing Threads(before count :".$this->threads->count().")...".PHP_EOL;
        $this->threads->remove(...$array);
        // var_dump($this->threads);
        echo "Removing Threads(after count: ".$this->threads->count().")...".PHP_EOL;
        $this->combinationsIterator=new CombinationsIterator($this->getThreadsAsArray(),6);
    }

    public function getSet() : Set {
        return $this->threads;
    }
    public function getThreads() : Set {
        return $this->threads;
    }

    /* not need if using Set class */
    /*
        Set:
            The code took 4.5061111450195E-5 seconds to complete.
        This Function:
            The code took 0.0054061412811279 seconds to complete.
    */
    public function deduplicateArray($array){
        array_walk_recursive($array, function($v) use (&$r){$r[]=$v;});
        return array_values(array_unique($r));
    }

}

class CombinationsIterator implements Iterator
{
    protected $c = null;
    protected $s = null;
    protected $n = 0;
    protected $k = 0;
    protected $pos = 0;

    function __construct($s, $k) {
        if(is_array($s)) {
            $this->s = array_values($s);
            $this->n = count($this->s);
        } else {
            $this->s = (string) $s;
            $this->n = strlen($this->s);
        }
        $this->k = $k;
        $this->rewind();
    }
    function key() {
        return $this->pos;
    }
    function current() {
        $r = array();
        for($i = 0; $i < $this->k; $i++)
            $r[] = $this->s[$this->c[$i]];
        return is_array($this->s) ? $r : implode('', $r);
    }
    function next() {
        if($this->_next())
            $this->pos++;
        else
            $this->pos = -1;
    }
    function rewind() {
        $this->c = range(0, $this->k);
        $this->pos = 0;
    }
    function valid() {
        return $this->pos >= 0;
    }

    protected function _next() {
        $i = $this->k - 1;
        while ($i >= 0 && $this->c[$i] == $this->n - $this->k + $i)
            $i--;
        if($i < 0)
            return false;
        $this->c[$i]++;
        while($i++ < $this->k - 1)
            $this->c[$i] = $this->c[$i - 1] + 1;
        return true;
    }

}

and here is some sample data to test. this will get the top 3 ordered sequences, removing the matching sets from the the universe of sets with each iteration:

function humanReadableMemory($size)
{
    $unit=array('b','kb','mb','gb','tb','pb');
    return @round($size/pow(1024,($i=floor(log($size,1024)))),2).' '.$unit[$i];
}
$tickets = [
    [1029],
    [1029],
    [1029],
    [1029],
    [1029],
    [1029],
    [1029],
    [1029, 1049],
    [1029, 1049],
    [1029, 1049, 1080, 1112, 1125, 1188],
    [1029, 1049, 1125],
    [1029, 1049, 1188, 1278],
    [1029, 1056, 1138, 1158, 1182, 1514, 1531],
    [1029, 1071, 1076],
    [1029, 1074, 1075, 1092, 1093, 1242, 1523, 1525],
    [1029, 1094, 1125, 1138, 1158],
    [1029, 1094, 1125, 1278],
    [1029, 1094, 1182, 1221, 1242, 1505],
    [1029, 1094, 1278, 1298],
    [1029, 1094, 1298],
    [1029, 1112, 1125, 1298, 1516],
    [1029, 1112, 1125, 1505],
    [1029, 1112, 1505],
    [1029, 1125],
    [1029, 1125],
    [1029, 1125, 1138],
    [1029, 1138, 1188, 1514, 1517],
    [1029, 1138, 1242, 1317, 1505, 1525],
    [1029, 1242],
    [1029, 1242],
    [1029, 1242, 1270, 1524],
    [1029, 1242, 1370, 1505],
    [1029, 1298],
    [1029, 1298, 1505],
    [1029, 1317],
    [1029, 1505],
    [1029, 1505],
    [1029, 1505, 1525],
    [1038, 1076, 1177, 1182],
    [1045, 1048, 1097, 1100],
    [1046, 1125, 1182, 1242, 1278],
    [1049],
    [1049],
    [1049],
    [1049],
    [1049],
    [1049],
    [1049],
    [1049],
    [1049]
];
$po=new ProductionOptimizer($tickets);
$start = microtime(TRUE);
$end = microtime(TRUE);
echo "Current Memory Usage: ".humanReadableMemory(memory_get_usage()).PHP_EOL;
echo "Peak Memory Usage: ".humanReadableMemory(memory_get_peak_usage()).PHP_EOL;
echo "Starting Iterations and Combination Testing...".PHP_EOL;
$rankedOutput=[];
$start = microtime(TRUE);
for ($i=0;$i<3;$i++) {
    $matchCountArray=[];$bestChoice=0;$bestChoiceId=-1;
    foreach ($po->getCombinationsIterator() as $comboTest){
        $combinationString=implode(",",$comboTest);
        $matchCountArray[$combinationString]=0;
        $comboTestSet=new Set($comboTest);
        foreach ($tickets as $ticketIdx=>$ticketElements) {
            $ticketElementsSet = new Set($ticketElements);
            $testsArray[$combinationString]+=($comboTestSet->contains(...$ticketElements) ? 1 : 0);
        }
        if ($matchCountArray[$combinationString]>$bestChoice) {
            $bestChoice = $matchCountArray[$combinationString];
            $bestChoiceId=$combinationString;
        } else {
            //trying to save memory
            unset($matchCountArray[$combinationString]);
        }
    }
    $currentBestChoiceSet=new Set(explode(",",$bestChoiceId));
    $currentUniverseSet=$po->getSet()->copy();
    for($tmpJ=0;$tmpJ<$currentUniverseSet->count();$tmpJ++){
        $testAgainstSet=$currentUniverseSet->get($tmpJ);
        if ($currentBestChoiceSet->contains(...$testAgainstSet->toArray())){
            $currentUniverseSet->remove($testAgainstSet);
        }
    }
    $tickets = [];
    foreach ($currentUniverseSet as $singleSet){
        $tickets[]=$singleSet->toArray();
    }
    $po=new ProductionOptimizer($tickets);
    $rankedOutput[$i]=["greatest_matching_sequence"=>$bestChoiceId, "coverage_count"=>$bestChoice];
}
echo "RankedOutput:".PHP_EOL;
var_dump($rankedOutput);
echo PHP_EOL;
$end = microtime(TRUE);
echo PHP_EOL."The code took " . ($end - $start) . " seconds to complete.".PHP_EOL;

Upvotes: 0

ldog
ldog

Reputation: 12151

Based on the updated question statement, here is a solution based on the linear programming relaxation using CVXPY.

Note, CVXPY allows toggling the relaxation on and off with the boolean parameter, thus you can experiment with both the relaxation and the branch and bound solver that CVXPY ships with.

#!/usr/bin/python
from __future__ import print_function

import cvxpy as cp

# Short-hand notation
a = 'a'
b = 'b'
c = 'c'
d = 'd'
e = 'e'
M = ({a}, {a}, {b}, {c}, {c}, {c}, {a,b}, {a,b}, {a,c}, {a,b,c}, {d, e})

# Set problem parameter
K = 2

# Construct union of all sets
U = set()
for S in M:
    U = U.union(S)
U = sorted(list(U)) #guarantee order

idx = { u : i for i,u in enumerate(U) }

n = len(U)

# Setting `use_relaxation` = True will have a massive impact on runtime as it
# will not require a branch and bound search
use_relaxation = False
use_boolean = not use_relaxation

x = cp.Variable(n, boolean=use_boolean)
s = cp.Variable(len(M), boolean=use_boolean)

# Construt the objective
constraints = [x>=0,x<=1,cp.sum(x) == K]
for iS,S in enumerate(M):
    set_cover = 0
    for v in S:
        set_cover = set_cover + x[idx[v]]
        constraints.append(x[idx[v]] >= s[iS])
    constraints.append(s[iS] >= set_cover - len(S) + 1)
objective = cp.Maximize(cp.sum(s))
prob = cp.Problem(objective,constraints)
prob.solve()

print("Status: ", prob.status)
print("The optimal value is: ", prob.value)
print("The x solution is: ", x.value)
print("The s solution is: ", s.value)

x_sol = [ round(_x) for _x in x.value ]
M_sol = [ round(_s) for _s in s.value ]

# get the final set S
S = [ v for v,_x in zip(U,x_sol) if _x]

# get which sets are covered
M_s = [ _S for _S,_M in zip(M,M_sol) if _M ]

print ("This is the selected set of elements of size K =", K,",")
print (S)
print ("These are the sets of M covered by S,")
print (M_s)

This produces the output,

Status:  optimal
The optimal value is:  5.999999999608626
The x solution is:  [1.00000000e+00 3.51527313e-11 1.00000000e+00 8.39903664e-11
 8.39903664e-11]
The s solution is:  [1.00000000e+00 1.00000000e+00 3.65886155e-11 1.00000000e+00
 1.00000000e+00 1.00000000e+00 2.37617387e-11 2.37617387e-11
 1.00000000e+00 1.77267846e-11 7.37231140e-11]
This is the selected set of elements of size K = 2 ,
['a', 'c']
These are the sets of M covered by S,
[set(['a']), set(['a']), set(['c']), set(['c']), set(['c']), set(['a', 'c'])]

When use_relaxation is False, this solution is guaranteed to produce the correct solution, possibly very slowly depending on how well the branch and bound search performs. When use_relaxation is True, this will solve very quickly, but may not produce the correct solution in many cases (and in some cases rounding will not give the correct number of elements in the set S.)

I recommend experimenting on a much larger set of instances with both settings and seeing how badly the rounded relaxed solution performs with respect to the optimal solution (the relaxation is not tight.)

Upvotes: 1

subhanshu kumar
subhanshu kumar

Reputation: 392

Try this:

#to count the subsets
def count_subset(a):
    count =0
    for b in M:
        if b.issubset(a):
            count =count+1
    return count

def compute(k):
    x ={"set":{},"count":0}
    for a in M:
        if(len(a)==k):
            count =count_subset(a)
            if count > x["count"]:
                x["set"] =a
                x["count"]=count
    print(x)

M = ({"a"}, {"a"}, {"b"}, {"c"}, {"c"}, {"c"}, {"a","b"}, {"a","b"}, {"a","c"}, {"a","b","c"})
compute(1)

Upvotes: 0

Related Questions