$CircularRefraction$

Reputation: 31

How to optimize this set-picking algorithm?

I am trying to solve the following problem:

I have several blocked sets (which may contain duplicate elements).
I must pick a (varying) number of elements from each blocked set to unblock it.
I am only allowed to pick elements that also occur in the picking set.
Whenever I remove an element from a blocked set I must also remove it from the picking set.
The picking set may contain more or less elements than strictly required.

Minimal example:

//Syntax for blocking sets:
//Name (number of elements required to pick): {elements in set}

//Syntax for picking set:
//Name: {elements in set}

BS1 (1): {0, 1}
BS2 (1): {1, 2}
Picking Set: {1, 2}

Possible solution:

BS1 (1): {0, 1}     <- take 1
BS2 (1): {1, 2}     <- take 2
Picking Set: {1, 2} <- remove 1, 2

If picking 1 from BS2, the problem becomes unsolvable. The picking set will reduce to {2} and BS1 only contains {0, 1}, making further picks impossible.

A more difficult scenario:

BS1 (1): {1, 2, 4} 
BS2 (2): {2, 3, 4} 
BS3 (3): {1, 3, 4, 4} 
Picking Set: {1, 2, 3, 4, 4, 4}

Possible Solution:

BS1 (1): {1, 2, 4}              <- take 1
BS2 (2): {2, 3, 4}              <- take 2, 4 
BS3 (3): {1, 3, 4, 4}           <- take 3, 4, 4
Picking Set: {1, 2, 3, 4, 4, 4} <- remove all

This scenario is solvable in multiple ways, but certain picks will lead to a dead end:

BS1 (1): {1, 2, 4}              <- take 1
BS2 (2): {2, 3, 4}              <- take 2, 3 
BS3 (3): {1, 3, 4, 4}           <- take 4, 4, and then dead end
Picking Set: {1, 2, 3, 4, 4, 4} <- remove all but one 4

My solution:

I wrote a recursive brute force algorithm that tests all picking combinations, then all following combinations of the next set based on that, etc. It works, but is slow. Number of combinations explodes but roughly half of the branches are successful, even for large problem instances. This makes me hopeful a heuristic or other method exists which can construct a valid solution directly.

My questions:

Is there a name for this problem?
What is the fastest way to construct a valid solution?
Can we do better than brute force (generate guaranteed valid solutions without trial and error)?

Upvotes: 3

Answers (2)

maxplus

Reputation: 535

I would place this problem in a class of maximum bipartite matching with additional constraints problems. It's likely impossible to give "the fastest way to construct a valid solution" without knowing limits on set sizes, blocking set amount, number of distinct elements and their multiplicity. But I'll give an efficient polynomial-time solution: reduction to the maximum flow problem, similar to other problems in that class. This should allow solving your problem even for total size of multisets on the order of 100'000.

Definitions

Let's define a graph describing our problem.

The picking multiset will be represented by one vertex, a_i, per each distinct element v_i having a multiplicity pickcnt_i. The j-th blocked multiset will be represented by one vertex, b_jk, per each distinct element u_jk having a multiplicity of blockcnt_jk, and an auxiliary "bottleneck" vertex c_j. price_j will designate the necessary amount of elements to unblock the j-th blocked multiset. Additionally vertices S, source, and T, sink, are defined.

v_i is usable pickcnt_i times, so S is connected to a_i by an edge with capacity pickcnt_i. Similarly b_jk is connected to c_j with capacity blockcnt_jk. c_j is connected to T with capacity price_j to limit the progress of "partial unblocking". a_i and b_jk are connected by an edge iff v_i == u_jk, with unlimited capacity.

Interpreting a flow

This graph represents a flow network. Let's look at an arbitrary feasible flow. Each unit of flow consumes: a unit capacity on S->a_i, modeling removal of a single v_i from the picking multiset; a unit capacity on b_jk->c_j, modeling removal of a single u_jk from the j-th blocked multiset; a unit capacity on c_j->T, modeling a single partial unblocking. Hence it is trivial to convert between a feasible flow and a matching of picking and blocking sets elements.

Let's look at a maximum flow. It doesn't violate any constraints from our original problem, and its value corresponds to the number of matched elements. So its value can't be higher than Σprice_j, can reach Σprice_j only by unblocking all sets, and must reach it if all sets can be unblocked. Therefore maximum flow gives a solution to the original problem if it satiates all c_j->T, and otherwise there is no solution.

Complexity

There are many algorithms for finding a maximum flow with complexities favoring dense or sparse graphs with small or unlimited capacities. Many perform in practice better than their complexity would suggest, particularly on special graphs like those produced by a bipartite matching problem. For some such graphs there are additional theorems proving a better complexity. Not knowing the limits I can't suggest a specific algorithm, only describe the size of the reduced problem.

The number of vertices is dominated by the sum of unique element counts for each set. The number of edges — by the number of valid "initial moves": what element can be used to partially unblock what multiset. The maximum flow is the maximum number of "moves" than can be performed.

To give an example of prewritten, ready to be used maxflow implementations, you can take a look at Dinic and PushRelabel here.

Upvotes: 3

user3386109

Reputation: 34829

The approach I would take to solve this problem is a technique I call attacking the branching factor.

I first became aware of this technique while writing a solver for Sudoku. So I'll explain how it works using a Sudoku puzzle as an example. Here's a partially solved Sudoku that was posted on the puzzling stack exchange.

The small grey numbers are the available choices for each empty square. The quantity of the small grey numbers is the branching factor for the square. For example, the empty squares in row 0 (the top row) have a branching factor of 3. Blindly trying every available choice for each square would result in 81 combinations to try.

Now take a look at row 5 (with the highlighted yellow square). Every square on that row has a branching factor of 2, which is only 16 combinations in total. So obviously, it's much better to start with row 5 than to start with row 0. And that's the principle that's at the heart of the technique. Don't blindly start at the upper-left square and work left-to-right top-to-bottom. Instead, identify the square with the smallest branching factor, and work on that square first.

For the example puzzle, the smallest branching factor is 2, and the yellow square happens to be one of the squares with that branching factor. The first choice to try is the 1. Choosing the 1 makes all sorts of wonderful things happen (just follow the blue arrows in the image below):

the branching factor of the square at {5,1} is reduced to 1, forcing the 4
then the square at {3,2) is reduced to 1, forcing the 3
then the square at {4,1} to reduced to 1, forcing the 1
then the square at {6,1} is reduced to 1, forcing the 3
then the square at {6,0} is reduced to 1, forcing the 1
and as a bonus, 10 others squares have their branching factors reduced (indicated by the red x's)

So by identifying the yellow square as the square with the lowest branching factor, and then choosing the 1 in that square, six squares are filled in with no additional branching. After filling in the six squares, another square with branching factor 2 needs to be chosen, and the process continues.

Applying this technique to the sample puzzle yields an answer in 160 attempts. That's pretty darn fast considering that there are 50 empty squares in the puzzle. Blindly solving the puzzle left-to-right top-to-bottom takes 12108 attempts. Solving the puzzle in a deliberately bad order takes 640,916,214 attempts.

To summarize the algorithm:

at each level of recursion:
   identify the choice in the problem that has the lowest branching factor
   for each of the allowed choices:
       make the choice
       update the branching factors for any other related choices
       move to the next level of recursion

Ok, now let's apply the technique to the problem posed in the question:

BS1 (1): {1, 2, 4} 
BS2 (2): {2, 3, 4} 
BS3 (3): {1, 3, 4, 4} 
Picking Set: {1, 2, 3, 4, 4, 4}

There are two types of branching in this problem, the branching within a blocked set (BS1, BS2, BS3), and the branching for the numbers (N1, N2, N3, N4).

Let's examine the sets first:

BS1 has a branching factor of 3, since one out of the three numbers must be chosen.
BS2 has a branching factor of 3, since two out of the three numbers must be chosen.
BS3 has a branching factor of 3, the choices are {1,3,4}, {1,4,4}, or {3,4,4}.

Now let's look at the numbers:

N1 appears in two sets, but only appears once in the picking set. So N1 has a branching factor of 2.
N2 also has a branching factor of 2.
N3 also has a branching factor of 2.
N4 has a branching factor of 3, the choices are {BS1, BS2, BS3}, {BS1, BS3, BS3}, or {BS2, BS3, BS3}

The best branching factor is 2. N1 has that, and the choices are N1 in BS1, or N1 in BS3. Try N1 in BS1, and update the branching factors:

BS1 is eliminated
BS2's branching factor is not affected
BS3's branching factor is reduced to 1
N1 is eliminated
N2's branching factor is reduced to 1
N3's branching factor is not affected
N4's branching factor is reduced to 1

The lowest branching factor is 1, BS3 has that, so BS3(3,4,4) is forced. After updating the branching factors, we find that BS2(2,4) is forced, and we're done.

Upvotes: 2

How to optimize this set-picking algorithm?

Answers (2)

Definitions

Interpreting a flow

Complexity

Related Questions