Rohit Pandey
Rohit Pandey

Reputation: 2681

What's wrong with this simple method to sample from multinomial in C#?

I wanted to implement a simple method to sample from a multinomial distribution in C# (the first argument is an array of integers we want to sample and the second one is the probabilities of selecting each of those integers).

When I do this with numpy in python, the results make sense.

np.random.choice(np.array([1,2,3,4,5,6]),p=np.array([.624,.23,.08,.04, .02, .006]),size=len(b))

I get a lot of 1's (probability 62%), a bunch of 2's, some 3's etc.

However, when I try the implementation below in C# (pretty straightforward inverse transform sampling for multinomial, only relies on a uniform random variable), I get really weird results. For all 1000 samples, I'll often find all 1's. Sometimes, I'll find all 3's (!!??). The results never look like what you would expect (and what you get from the python function - try running it yourself a few times). This is really scary since we rely on these primitives. Does anyone have insight into what might be wrong with the C# version?

    static void Main(string[] args)
    {
        int[] iis = new int[7];
        int[] itms = new int[] { 1, 2, 3, 4, 5, 6 };
        double[] probs = new double[] { .624, .23, .08, .04, .02, .006 };
        for (int i = 0; i < 1000; i++)
        {
            iis[MultinomialSample(itms, probs)] += 1;
        }

        foreach (var ii in iis)
        {
            Console.Write(ii + ",");
        }

        Console.Read();
    }


     private static int MultinomialSample(int[] s, double[] ps)
    {
        double[] cumProbs = new double[ps.Length];
        cumProbs[0] = ps[0];

        for (int i = 1; i < ps.Length; i++)
        {
            cumProbs[i] = cumProbs[i - 1] + ps[i];
        }

        Random random = new Random();
        double u = random.NextDouble();

        for (int i = 0; i < cumProbs.Length - 1; i++)
        {
            if (u < cumProbs[i])
            {
                return s[i];
            }
        }

        return s[s.Length - 1];
    }

Upvotes: 0

Views: 794

Answers (2)

Joe Blogs
Joe Blogs

Reputation: 41

I used Rufus' code in a simulation I was working on and noticed there is still a problem, even after seeding the random number generator just once (which is the correct thing to do). You will notice that as we are iterating, the call to random.NextDouble() generates a new random number each time. This is wrong.

for (var i = 0; i < cumProbs.Length - 1; i++)
{
    if (random.NextDouble() < cumProbs[i])
    {
        return sample[i];
    }
}

The random number should be generated outside of the loop, as follows:

var r = random.NextDouble();
for (var i = 0; i < cumProbs.Length - 1; i++)
{
    if (r < cumProbs[i])
    {
        return sample[i];
    }
}

You can compare it to the Excel algorithm given on Wikipedia: https://en.wikipedia.org/wiki/Multinomial_distribution. When I made the above change to Rufus' code, I got the desired frequency distribution as specified by the probabilities array.

Upvotes: 1

Rufus L
Rufus L

Reputation: 37050

You're initializing Random each time you call MultinomialSample. If these calls are very close together, Random will be initialized with the same seed (based on the system clock). Try either making Random a private class field: private static Random random = new Random(); or pass it into the method as an argument from Main, where it would be initialized only once:

private static Random random = new Random();

private static int MultinomialSample(IReadOnlyList<int> sample, 
    IReadOnlyList<double> probabilities)
{
    var cumProbs = new double[probabilities.Count];
    cumProbs[0] = probabilities[0];

    for (var i = 1; i < probabilities.Count; i++)
    {
        cumProbs[i] = cumProbs[i - 1] + probabilities[i];
    }

    for (var i = 0; i < cumProbs.Length - 1; i++)
    {
        if (random.NextDouble() < cumProbs[i])
        {
            return sample[i];
        }
    }

    return sample[sample.Count - 1];
}

private static void Main()
{
    var iis = new int[7];
    var items = new[] {1, 2, 3, 4, 5, 6};
    var probabilities = new[] {.624, .23, .08, .04, .02, .006};

    for (int i = 0; i < 1000; i++)
    {
        iis[MultinomialSample(items, probabilities)] ++;
    }

    Console.WriteLine(string.Join(", ", iis));

    Console.WriteLine("\nDone!\nPress any key to exit...");
    Console.ReadKey();
}

Upvotes: 2

Related Questions