Reputation: 2483
Given the set of integers {1,...,n}, I would like to sample uniformly from the binom{n+k-1}{k} distinct multi-subsets of size k. Is there an efficient way of doing this?
For example, the set {1,2,3} has 6 multi-subsets of size 2. These are {1,2}, {2,3}, {1,3}, {1,1}, {2,2}, {3,3}.
Upvotes: 5
Views: 527
Reputation: 281519
Yes. Since you know there are (n+k-1) choose k such multi-subsets, you are probably aware of the stars and bars combinatorial problem whose solution provides that formula. The solution to that problem suggests a sampling procedure to produce multi-subsets: randomly choose a way to place k stars and n-1 bars, then determine how the bars partition the stars into groups:
import random
import collections
stars = set(random.sample(xrange(n+k-1), k))
multiset = collections.Counter()
# Don't hide the bin builtin.
bin_ = 1
for i in xrange(n+k-1):
if i in stars:
multiset[bin_] += 1
else:
bin_ += 1
This will produce a collections.Counter
counting the number of times each number was chosen. I've initialized bin_ = 1
to produce a multi-subset of {1...n}; bin_ = 0
would produce a multi-subset of {0...n-1}.
(Previously, I posted an answer suggesting the use of a multinomial distribution. That is not the right distribution; it gives too little weight to results with repeated elements. Sorry for the error. Since the ways to place k stars and n-1 bars are in direct correspondence with the multi-subsets of {1...n}, this solution should produce a uniform distribution.)
Upvotes: 3
Reputation: 1672
Unfortunately I cannot test my solution now, it is based on the pervious one:
import random as rd
#n,k - should be predefined
x=[[j]*k for j in xrange(n)] # [1,1,1,1 - k times, 2,2,2,- k times etc.]
rd.sample(x,k)
Upvotes: 0