Alecto
Alecto

Reputation: 10750

Uniformly selecting a distribution of items into bins

Imagine that you have n items and m bins. All items are identical, however the bins are distinct. What's the fastest algorithm to randomly select an allotment of items into bins?

For example - imagine 104 is a placement of 5 items into 3 bins. There are 21 possible placements of 5 items into 3 bins:

005  014  023
032  041  050
104  113  122
131  140  203
212  221  230
302  311  320
401  410  500

For large numbers of items and bins, how should I select a placement of n items into m bins such that each possible placement has an equal chance of occurring? For a given n and m selection will be done a large number of times.

Upvotes: 3

Views: 1078

Answers (2)

Rory Daulton
Rory Daulton

Reputation: 22564

Here are two algorithms for two situations.

You have much memory available and you will do many placements. In this case, you can use your memory to pre-calculate and store all possible placements of your n items into m bins. If we let C(n, r) be the number of combinations of r items taken from n items, without repetition and without regard to order, then the number of possible placements is C(m+n-1, m-1). (That formula is pretty standard in combinatorics and uses the stars and bars method). In your example, that is

C(5+3-1, 3-1) = C(7, 2) = 7!/2!/(7-2)! = 7/1*6/2 = 21

(If MathJax were available in StackOverflow I could make that look much more pretty with standard math notation.) In the setup to your program, those placements can be generated with a small recursive routine with this idea--place k items (0 <= k <= n) into the first bin, then place the remaining n-k items into the remaining m-1 bins. Then each time you want a random placement, choose a random number between 1 and C(m+n-1, m-1) and use that as an index to choose the placement. The time cost per additional placement is just one random number calculation and one array access. You can't get much better than that.

You have little memory available and you will do one or few placements. Then you can choose your random placement with an iterative routine that calculates multiple combination coefficients.

First, calculate the number of possible placement of your n items into m bins, C(m+n-1, m-1), and choose a random number r from 1 to that combinations number. Let k be the number of items to place in the first bin. There will then be n-k remaining items in m-1 bins, which has a count of C(m+n-k-2, m-2). Now cycle k starting at 0. If that count for k=0 exceeds or equals r, we decide to set k=0 items in the first bin. If not, increase k by one and decrease r by that combinations count, and find a new combinations count for the new k. If that count exceeds or equals r, we decide to set k items in the first bin. If not, increase k by one and ... you get the idea. When we have chosen a particular value of k we replace n with n-k, m with m-1, leave r as it now is, and move to the next bin.

The count for this is n iterations through the items and m iterations through the bins, for m+n iterations and calculations of combination coefficients. The only memory usage is a few simple variables and the final placement into m bins. You want a good combination coefficients calculator routine.

(ADDED LATER: I have fully coded this routine and found a better way to calculate the probabilities involved without finding the numbers of combinations. This reduces the calculation time and fully achieves the order m+n for the routine. This could be reduced to order m if I could find a good function to directly find a value that gives a certain probability, but I am unable to find such a function. I could find approximations, if you are willing to have a near-uniform distribution of placements rather than a fully uniform distribution.)

Let me know if you want some Python code demonstrating either routine, but clarify which situation you are in and show some more of your own effort first.

Upvotes: 2

Five identical items can be distributed over three distinct bins in 35 = 243 ways, each leading to one of these 21 distributions:

500   410   320   311   221  
050   401   302   131   212  
005   140   230   113   122  
      104   203  
      041   032  
      014   023  

You'll notice that there are two mechanisms at work here: firstly the number 5 can be partitioned into a maximum of 3 parts in 5 different ways (the columns), and secondly each such partition has a number of permutations (the rows).

To enumerate the partitions with restricted number of parts, use a recursive algorithm, e.g. with 5 items and 3 bins:

5 items in 1 bin  
4 items in 1 bin + recurse with 1 item in 2 bins  
3 items in 1 bin + recurse with 2 items in 2 bins
2 items in 1 bin + recurse with 3 items in 2 bins

Generate only non-rising sequences like [2,2,1], not [2,1,2] or [0,2,3], by never putting a number of items in the current bin that is smaller than the number of items divided by the number of bins (that's why there's no option with 1 item in the first bin in the example above).

(Partitioning can be speeded up by using memoization.)

To calculate the probability of each partition (i.e. the number of permutations), divide the factorial of the number of bins by the product of the factorials of the number of bins with a certain number of items:

5,0,0   3! / (1! x 2!)      = 3  
4,1,0   3! / (1! x 1! x 1!) = 6  
3,2,0   3! / (1! x 1! x 1!) = 6  
3,1,1   3! / (1! x 2!)      = 3  
2,2,1   3! / (2! x 1!)      = 3  
                             --
                             21

Then, choose a random number from 1 to 21, and select the corresponding partition and its permutation; e.g. choosing 13 would mean partition [3,2,0] and its fourth permutation [2,0,3].

So instead of enumerating all (243 in the 5:3 example) options or all (21) distributions, we enumerate the (5) partitions, and maybe the (up to 6) permutations if there's no faster way to get to the n-th unique permutation. For large numbers, this should make a huge difference.

(For details and code examples for some of these steps, see this answer to a related question.)

Upvotes: 0

Related Questions