Artavazd
Artavazd

Reputation: 175

Random number within a range biased towards the minimum value of that range

I want to generate random numbers within a range (1 - 100000), but instead of purely random I want the results to be based on a kind of distribution. What I mean that in general I want the numbers "clustered" around the minimum value of the range (1).

I've read about Box–Muller transform and normal distributions but I'm not quite sure how to use them to achieve the number generator.

How can I achieve such an algorithm using C#?

Upvotes: 1

Views: 1142

Answers (1)

Spektre
Spektre

Reputation: 51933

There are a lot of ways doing this (using uniform distribution prng) here few I know of:

  1. Combine more uniform random variables to obtain desired distribution.

    I am not a math guy but there sure are equations for this. This kind of solution has usually the best properties from randomness and statistical point of view. For more info see the famous:

    but there are limited number of distributions we know the combinations for.

  2. Apply non linear function on uniform random variable

    This is the simplest to implement. You simply use floating randoms in <0..1> range apply your non linear function (that change the distribution towards your wanted shape) on them (while result is still in the <0..1> range) and rescale the result into your integer range for example (in C++):

    floor( pow( random(),5 ) * 100000 )
    

    The problem is that this is just blind fitting of the distribution so you usually need to tweak the constants a bit. It a good idea to render histogram and randomness graphs to see the quality of result directly like in here:

    You can also avoid too blind fitting with BEZIERS like in here:

  3. Distribution following pseudo random generator

    there are two approaches I know of for this the simpler is:

    1. create big enough array of size n
    2. fill it with all values following the distribution

      so simply loop through all values you want to output and compute how many of them will be in n size array (from your distribution) and add that count of the numbers into array. Beware the filled size of the array might be slightly less than n due to rounding. If n is too small you will be missing some less occurring numbers. so if you multiply probability of the least probable number and n it should be at least >=1. After the filling change the n into the real array size (number of really filled numbers in it).

    3. shuffle the array

    4. now use the array as linear list of random numbers

      so instead of random() you just pick a number from array and move to the next one. Once you get into n-th value schuffle the array and start from first one again.


    This solution has very good statistical properties (follows the distribution exactly) but the randomness properties are not good and requires array and occasional shuffling. For more info see:


    The other variation of this is to avoid use of array and shuffling. It goes like this:

    1. get random value in range <0..1>
    2. apply inverse cumulated distribution function to convert to target range

      as you can see its like the #2 Apply non linear function... approach but instead of "some" non linear function you use directly the distribution. So if p(x) is probability of x in range <0..1> where 1 means 100% than we need a function that cumulates all the probabilities up to x (sorry do not know the exact math term in English). For integers:

      f(x) = p(0)+p(1)+...+p(x)
      

      Now we need inverse function g() to it so:

      y = f(x)
      x = g(y) 
      

      Now if my memory serves me well then the generation should look like this:

      y = random(); // <0..1>
      x = g(y);     // probability -> value 
      

      Many distributions have known g() function but for those that do not (or we are too lazy to derive it) you can use binary search on p(x). Too lazy to code it so here slower linear search version:

      for (x=0;x<max;x++) if (f(x)>=y) break;
      

      So when put all together (and using only p(x)) I got this (C++):

      y=random();               // uniform distribution pseudo random value in range <0..1>
      for (f=0.0,x=0;x<max;x++) // loop x through all values
       {
       f+=p(x);                 // f(x) cumulative distribution function
       if (f>=y) break;        
       }
      // here x is your pseudo random value following p(x) distribution
      

    This kind of solution has usually very good both statistical and randomness properties and does not require that the distribution is a continuous function (it can be even just an array of values instead).

Upvotes: 1

Related Questions