Reputation: 1286
Help me find a good algorithm?
I have a bag full of n balls. (Let's say 28 balls for an example.)
The balls in this bag each have 1 color. There are <= 4 different colors of balls in the bag. (Let's say red, green, blue, and purple are possibilities.)
I have three buckets, each with a number of how many balls they need to end up with. These numbers total n. (For example, let's say bucket A needs to end up with 7 balls, bucket B needs to end up with 11 balls, bucket C needs to end up with 10 balls.)
The buckets also may or may not have color restrictions - colors that they will not accept. (Bucket A doesn't accept purple balls or green balls. Bucket B doesn't accept red balls. Bucket C doesn't accept purple balls or blue balls.)
I need to distribute the balls efficiently and randomly (equal probability of all possibilities).
I can't just randomly put balls in buckets that have space to accept them, because that could bring me to a situation where the only bucket that has space left in it does not accept the only color that is left in the bag.
It is given that there is always at least 1 possibility for distributing the balls. (I will not have a bag of only red balls and some bucket with number > 0 doesn't accept red balls.)
All of the balls are considered distinct, even if they are the same color. (One of the possibilities where bucket C gets red ball 1 and not red ball 2 is different from the possibility where everything is the same except bucket C gets red ball 2 and not red ball 1.)
Edit to add my idea:
I don't know if this has equal probability of all possibilities, as I would like. I haven't figured out the efficiency - It doesn't seem too bad. And this contains an assertion that I'm not sure if it's always true. Please comment on any of these things if you know.
Choose a ball from the bag at random. (Call it "this ball".)
If this ball fits and is allowed in a number of buckets > 0:
Choose one of those buckets at random and put this ball in that bucket.
else (this ball is not allowed in any bucket that it fits in):
Make a list of colors that can go in buckets that are not full.
Make a list of balls of those colors that are in full buckets that this ball is allowed in.
If that 2nd list is length 0 (There are no balls of colors from the 1st list in the bucket that allows the color of this ball):
ASSERT: (Please show me an example situation where this might not be the case.)
There is a 3rd bucket that is not involved in the previously used buckets in this algorithm.
(One bucket is full and is the only one that allows this ball.
A second bucket is the only one not full and doesn't allow this ball or any ball in the first bucket.
The 3rd bucket is full must allow some color that is in the first bucket and must have some ball that is allowed in the second bucket.)
Choose, at random, a ball from the 3rd bucket balls of colors that fit in the 2nd bucket, and move that ball to the 2nd bucket.
Choose, at random, a ball from the 1st bucket balls of colors that fit in the 3rd bucket, and move that ball to the 3rd bucket.
Put "this ball" (finally) in the 1st bucket.
else:
Choose a ball randomly from that list, and move it to a random bucket that is not full.
Put "this ball" in a bucket that allows it.
Next ball.
Upvotes: 4
Views: 2402
Reputation: 13576
In some cases at least, this problem can be solved quite quickly by first using the constraints to reduce the problem to a more manageable size, then searching the solution space.
First, note that we can ignore the distinctness of the balls for the main part of the algorithm. Having found a solution only considering color, it’s trivial to randomly assign distinct ball numbers per color by shuffling within each color.
To restate the problem and clarify the notion of equal probability, here is a naive algorithm that is simple and correct but potentially very inefficient:
This samples from the space of all permutations with equal probability and filters out the ones that don’t meet the constraints, so uniform probability is satisfied. However, given even moderately severe constraints, it might loop many millions of times before finding a solution. On the other hand, if the problem is not very constrained, it will find a solution quickly.
We can exploit both of these facts by first examining the constraints and the number of balls of each color. For example, consider the following:
In a trial run with these parameters, the naive algorithm failed to find a valid solution within 20 million iterations. But now let us reduce the problem.
Note that all 6 purple balls must go in B, because it’s the only bucket that can accept them. So the problem reduces to:
C needs 10 balls, and can only take red and green. There are 6 of each. The possible counts are 4+6, 5+5, 6+4. So we must put at least 4 red and 4 green in C.
We have to put 10 blue balls somewhere. C won’t take any. B can take 5 at most; the other 5 must go in A. A can take 7 at most; the other 3 must go in B. Thus, A must take at least 5 blue, and B must take at least 3 blue.
At this point, the problem is trivial: checking random solutions to the reduced problem will find a valid solution within a few tries.
For the fully-reduced problem, 80 out of 720 permutations are valid, so a valid solution will be found with probability 1/9. For the original problem, out of 28! permutations there are 7! * 11! * 10! * 80 valid solutions, and the probability of finding one is less than one in five billion.
Turning the human reasoning used above into a reducing algorithm is more difficult, and I will only consider it briefly. Generalizing from the specific cases above:
If these don’t reduce a specific problem sufficiently, examination of the problem may yield other reductions that can then be coded.
Finally: will this always work well? It’s hard to be sure, but I suspect it will, in most cases, because the constraints are what cause the naive algorithm to fail. If we can use the constraints to reduce the problem to one where the constraints don’t matter much, the naive algorithm should find a solution without too much trouble; the number of valid solutions should be a reasonably large fraction of all the possibilities.
Afterthought: the same reduction technique would improve the performance of the other answers here, too, assuming they’re correct.
Upvotes: 0
Reputation: 65458
Here's an O(n^3)-time algorithm. (The 3 comes from the number of buckets.)
We start by sketching a brute-force enumeration algorithm, then extract an efficient counting algorithm, then show how to sample.
We enumerate with an algorithm that has two nested loops. The outer loop iterates through the balls. The color of each ball does not matter; only that it can be placed in certain buckets but not others. At the beginning of each outer iteration, we have a list of partial solutions (assignments of the balls considered so far to buckets). The inner loop is over partial solutions; we add several partial solutions to a new list by extending the assignment in all valid ways. (The initial list has one element, the empty assignment.)
To count solutions more efficiently, we apply a technique called dynamic programming or run-length encoding depending on how you look at it. If two partial solutions have the same counts in each bucket (O(n^3) possibilities over the life of the algorithm), then all valid extensions of one are valid extensions of the other and vice versa. We can annotate the list elements with a count and discard all but one representative of each "equivalence class" of partial solutions.
Finally, to get a random sample, instead of choosing the representative arbitrarily, when we are combining two list entries, we sample the representative from each side proportionally to that side's count.
Working Python code (O(n^4) for simplicity; there are data structural improvements possible).
#!/usr/bin/env python3
import collections
import random
def make_key(buckets, bucket_sizes):
return tuple(bucket_sizes[bucket] for bucket in buckets)
def sample(balls, final_bucket_sizes):
buckets = list(final_bucket_sizes)
partials = {(0,) * len(buckets): (1, [])}
for ball in balls:
next_partials = {}
for count, partial in partials.values():
for bucket in ball:
next_partial = partial + [bucket]
key = make_key(buckets, collections.Counter(next_partial))
if key in next_partials:
existing_count, existing_partial = next_partials[key]
total_count = existing_count + count
next_partials[key] = (total_count, existing_partial if random.randrange(total_count) < existing_count else next_partial)
else:
next_partials[key] = (count, next_partial)
partials = next_partials
return partials[make_key(buckets, final_bucket_sizes)][1]
def test():
red = {'A', 'C'}
green = {'B', 'C'}
blue = {'A', 'B'}
purple = {'B'}
balls = [red] * 8 + [green] * 8 + [blue] * 8 + [purple] * 4
final_bucket_sizes = {'A': 7, 'B': 11, 'C': 10}
return sample(balls, final_bucket_sizes)
if __name__ == '__main__':
print(test())
Upvotes: 1
Reputation: 3079
1 First you choose 7 between 28: you have C28,7 =1184040 possibilities.
2 Second, you choose 11 between remaining 21: you have C21,11=352716 possibilities.
3 remaining 10 elements are in bucket C.
At each step, if your choice doesnt fit the rules, you stop and do it again.
Everything makes 417629852640 possibilities (without restriction).
It is not very efficient, but for one choice, it doesnt matter a lot. If restrictions are not too restrictive, you do not lose too much time.
If there are very few solutions, you must restrict combinations (only good colors).
Upvotes: 0
Reputation: 495
I am not really sure what is the trade off you want between a random, a correct and an efficient distribution.
If you want a completely random distribution just pick a ball and put it randomly in a bucket it can go in. It would be pretty efficient but you may easily make a bucket overflow.
If you want to be sure to be correct and random you could try to get all distribution possible correct and pick one of it randomly, but it could be very inefficient since the basic brute force algorithm to create all distribution possibility would nearly be at a complexity of NumberOfBucket^NumberOfBalls.
A better algorithm to create all correct case would be to try to build all case verifying your two rules (a bucket B1 can only have N1 balls & a bucket only accept certain colors) color by color. For instance:
//let a distribution D be a tuple N1,...,Nx of the current number of balls each bucket can accept
void DistributeColor(Distribution D, Color C) {
DistributeBucket(D,B1,C);
}
void DistributeBucket(Distribution D, Bucket B, Color C) {
if B.canAccept(C) {
for (int i = 0; i<= min(D[N],C.N); i++) {
//we put i balls of the color C in the bucket B
C.N-=i;
D.N-=i;
if (C.N == 0) {
//we got no more balls of this color
if (isLastColor(C)){
//this was the last color so it is a valid solution
save(D);
} else {
//this was not the last color, try next color
DistributeColor(D,nextColor(C))
}
} else {
//we still got balls
if (isNotLastBucket(B)) {
//This was not the last bucket let's try to fill the next one
DistributeBucket(D, nextBucket(B), C)
} else {
//this was the last bucket, so this distibution is not a solution. Let's do nothing (please don't kill yourself :/ )
}
}
//reset the balls for the next try
C.N+=i;
D.N+=i;
}
}
//it feel like déjà vu
if (isNotLastBucket(B)) {
//This was not the last bucket let's try to fill the next one
DistributeBucket(D, nextBucket(B), C)
} else {
//this was the last bucket, so this distribution is not a solution.
}
}
(This code is pseudo C++ and is not meant to runnable)
Upvotes: 0