Reputation: 31
I am trying to solve the following problem:
Minimal example:
//Syntax for blocking sets:
//Name (number of elements required to pick): {elements in set}
//Syntax for picking set:
//Name: {elements in set}
BS1 (1): {0, 1}
BS2 (1): {1, 2}
Picking Set: {1, 2}
Possible solution:
BS1 (1): {0, 1} <- take 1
BS2 (1): {1, 2} <- take 2
Picking Set: {1, 2} <- remove 1, 2
If picking 1 from BS2, the problem becomes unsolvable. The picking set will reduce to {2} and BS1 only contains {0, 1}, making further picks impossible.
A more difficult scenario:
BS1 (1): {1, 2, 4}
BS2 (2): {2, 3, 4}
BS3 (3): {1, 3, 4, 4}
Picking Set: {1, 2, 3, 4, 4, 4}
Possible Solution:
BS1 (1): {1, 2, 4} <- take 1
BS2 (2): {2, 3, 4} <- take 2, 4
BS3 (3): {1, 3, 4, 4} <- take 3, 4, 4
Picking Set: {1, 2, 3, 4, 4, 4} <- remove all
This scenario is solvable in multiple ways, but certain picks will lead to a dead end:
BS1 (1): {1, 2, 4} <- take 1
BS2 (2): {2, 3, 4} <- take 2, 3
BS3 (3): {1, 3, 4, 4} <- take 4, 4, and then dead end
Picking Set: {1, 2, 3, 4, 4, 4} <- remove all but one 4
My solution:
I wrote a recursive brute force algorithm that tests all picking combinations, then all following combinations of the next set based on that, etc. It works, but is slow. Number of combinations explodes but roughly half of the branches are successful, even for large problem instances. This makes me hopeful a heuristic or other method exists which can construct a valid solution directly.
My questions:
Upvotes: 3
Views: 131
Reputation: 535
I would place this problem in a class of maximum bipartite matching with additional constraints problems. It's likely impossible to give "the fastest way to construct a valid solution" without knowing limits on set sizes, blocking set amount, number of distinct elements and their multiplicity. But I'll give an efficient polynomial-time solution: reduction to the maximum flow problem, similar to other problems in that class. This should allow solving your problem even for total size of multisets on the order of 100'000.
Let's define a graph describing our problem.
The picking multiset will be represented by one vertex, a_i
, per each distinct element v_i
having a multiplicity pickcnt_i
. The j
-th blocked multiset will be represented by one vertex, b_jk
, per each distinct element u_jk
having a multiplicity of blockcnt_jk
, and an auxiliary "bottleneck" vertex c_j
. price_j
will designate the necessary amount of elements to unblock the j
-th blocked multiset. Additionally vertices S
, source, and T
, sink, are defined.
v_i
is usable pickcnt_i
times, so S
is connected to a_i
by an edge with capacity pickcnt_i
. Similarly b_jk
is connected to c_j
with capacity blockcnt_jk
. c_j
is connected to T
with capacity price_j
to limit the progress of "partial unblocking". a_i
and b_jk
are connected by an edge iff v_i == u_jk
, with unlimited capacity.
This graph represents a flow network. Let's look at an arbitrary feasible flow. Each unit of flow consumes: a unit capacity on S
->a_i
, modeling removal of a single v_i
from the picking multiset; a unit capacity on b_jk
->c_j
, modeling removal of a single u_jk
from the j
-th blocked multiset; a unit capacity on c_j
->T
, modeling a single partial unblocking. Hence it is trivial to convert between a feasible flow and a matching of picking and blocking sets elements.
Let's look at a maximum flow. It doesn't violate any constraints from our original problem, and its value corresponds to the number of matched elements. So its value can't be higher than Σprice_j
, can reach Σprice_j
only by unblocking all sets, and must reach it if all sets can be unblocked. Therefore maximum flow gives a solution to the original problem if it satiates all c_j
->T
, and otherwise there is no solution.
There are many algorithms for finding a maximum flow with complexities favoring dense or sparse graphs with small or unlimited capacities. Many perform in practice better than their complexity would suggest, particularly on special graphs like those produced by a bipartite matching problem. For some such graphs there are additional theorems proving a better complexity. Not knowing the limits I can't suggest a specific algorithm, only describe the size of the reduced problem.
The number of vertices is dominated by the sum of unique element counts for each set. The number of edges — by the number of valid "initial moves": what element can be used to partially unblock what multiset. The maximum flow is the maximum number of "moves" than can be performed.
To give an example of prewritten, ready to be used maxflow implementations, you can take a look at Dinic and PushRelabel here.
Upvotes: 3
Reputation: 34829
The approach I would take to solve this problem is a technique I call attacking the branching factor.
I first became aware of this technique while writing a solver for Sudoku. So I'll explain how it works using a Sudoku puzzle as an example. Here's a partially solved Sudoku that was posted on the puzzling stack exchange.
The small grey numbers are the available choices for each empty square. The quantity of the small grey numbers is the branching factor for the square. For example, the empty squares in row 0 (the top row) have a branching factor of 3. Blindly trying every available choice for each square would result in 81 combinations to try.
Now take a look at row 5 (with the highlighted yellow square). Every square on that row has a branching factor of 2, which is only 16 combinations in total. So obviously, it's much better to start with row 5 than to start with row 0. And that's the principle that's at the heart of the technique. Don't blindly start at the upper-left square and work left-to-right top-to-bottom. Instead, identify the square with the smallest branching factor, and work on that square first.
For the example puzzle, the smallest branching factor is 2, and the yellow square happens to be one of the squares with that branching factor. The first choice to try is the 1. Choosing the 1 makes all sorts of wonderful things happen (just follow the blue arrows in the image below):
So by identifying the yellow square as the square with the lowest branching factor, and then choosing the 1 in that square, six squares are filled in with no additional branching. After filling in the six squares, another square with branching factor 2 needs to be chosen, and the process continues.
Applying this technique to the sample puzzle yields an answer in 160 attempts. That's pretty darn fast considering that there are 50 empty squares in the puzzle. Blindly solving the puzzle left-to-right top-to-bottom takes 12108 attempts. Solving the puzzle in a deliberately bad order takes 640,916,214 attempts.
To summarize the algorithm:
at each level of recursion:
identify the choice in the problem that has the lowest branching factor
for each of the allowed choices:
make the choice
update the branching factors for any other related choices
move to the next level of recursion
Ok, now let's apply the technique to the problem posed in the question:
BS1 (1): {1, 2, 4}
BS2 (2): {2, 3, 4}
BS3 (3): {1, 3, 4, 4}
Picking Set: {1, 2, 3, 4, 4, 4}
There are two types of branching in this problem, the branching within a blocked set (BS1, BS2, BS3), and the branching for the numbers (N1, N2, N3, N4).
Let's examine the sets first:
Now let's look at the numbers:
The best branching factor is 2. N1 has that, and the choices are N1 in BS1, or N1 in BS3. Try N1 in BS1, and update the branching factors:
The lowest branching factor is 1, BS3 has that, so BS3(3,4,4) is forced. After updating the branching factors, we find that BS2(2,4) is forced, and we're done.
Upvotes: 2