Daniel
Daniel

Reputation: 1837

Output of code using System.Random does not approach theoretical limit as iterations increase

I'm not good at stats, so I tried to solve a simple problem in C#. The problem: "A given team has a 65% chance to win a single game against another team. What is the probability that they will win a best-of-5 set?"

I wanted to look at the relationship between that probability and the number of games in the set. How does a Bo3 compare to a Bo5, and so on?

I did this by creating Set and Game objects and running iterations. The win decision is done with this code:

Won = rnd.Next(1, 100) <= winChance;

rnd is, as you might expect, a static System.Random object.

Here's my Set object code:

public class Set
{
    public int NumberOfGames { get; private set; }
    public List<Game> Games { get; private set; }
    public Set(int numberOfGames, int winChancePct)
    {
        NumberOfGames = numberOfGames;
        GamesNeededToWin = Convert.ToInt32(Math.Ceiling(NumberOfGames / 2m));
        Games = Enumerable.Range(1, numberOfGames)
                          .Select(i => new Game(winChancePct))
                          .ToList();
    }

    public int GamesNeededToWin { get; private set; }
    public bool WonSet => Games.Count(g => g.Won) >= GamesNeededToWin;
}

My issue is that the results I get aren't quite what they should be. Someone who sucks less at stats did the math for me, and it seems my code is always overestimating the chance of winning the set, and the number of iterations doesn't improve the accuracy.

The results I get (% set win by games per set) are below. The first column is the games per set, the next is the statistical win rate (which my results should be approaching), and the remaining columns are my results based on the number of iterations. As you can see, more iterations don't seem to be making the numbers more accurate.

Games Per Set|Expected Set Win Rate|10K|100K|1M|10M

1 65.0% 66.0% 65.6% 65.7% 65.7%

3 71.8% 72.5% 72.7% 72.7% 72.7%

5 76.5% 78.6% 77.4% 77.5% 77.5%

7 80.0% 80.7% 81.2% 81.0% 81.1%

9 82.8% 84.1% 83.9% 83.9% 83.9%

The entire project is posted on github here if you want to look.

Any insight into why this isn't producing accurate results would be greatly appreciated.

Upvotes: 1

Views: 257

Answers (2)

Eric Lippert
Eric Lippert

Reputation: 660128

The answer of Darren Sisson is correct; your computation is off by approximately 1%, and so all your results are as well.

My recommendation is that you solve the problem by encapsulating your desired semantics into an object which you can then test independently:

interface IDistribution<T>
{
  T Sample();
}
static class Extensions 
{
  public static IEnumerable<T> Samples(this IDistribution<T> d)
  {
    while (true) yield return d.Sample();
  }
}
class Bernoulli : IDistribution<bool>
{
  // Note that we could also make it IDistribution<int> and return
  // 0 and 1 instead of false and true; that would be the more 
  // "classic" approach to a Bernoulli distribution.  Your choice.
  private double d;
  private Random random = new Random();
  private Bernoulli(double d) { this.d = d; }
  public static Make(double d) => new Bernoulli(d);
  public bool Sample() => random.NextDouble() < d;
}

And now you have a biased coin flipper which you can test independently. You can now write code like:

int flips = 1000;
int heads = Bernoulli
  .Make(0.65)
  .Samples()
  .Take(flips)
  .Where(x => x)
  .Count();

to do 1000 coin flips with a 65% chance of heads.

Notice that what we are doing here is constructing a probability distribution monad and then using the tools of LINQ to express a conditional probability. This is a powerful technique; your application barely scratches the surface of what we can do with it.

Exercise: construct extension methods Where, Select and SelectMany which take not IEnumerable<T> but rather IDistribution<T>; can you express the semantics of the distribution in terms of the distribution type itself, rather than making a transformation from the distribution monad to the sequence monad? Can you do the same for zip joins?

Exercise: construct other implementations of IDistribution<T>. Can you construct, say, a Cauchy distribution of doubles? What about a normal distribution? What about a dice-rolling distribution on a fair die of n sides? Now, can you put this all together? What is the distribution which is: flip a coin; if heads, roll four dice and add them together, otherwise roll two dice and discard all the doubles, and multiply the results.

Upvotes: 6

Darren Sisson
Darren Sisson

Reputation: 26

quick look, The Random function's upper bound is exclusive so would need to be set to 101

Upvotes: 1

Related Questions