user2882307
user2882307

Reputation:

64bit random number between a range

So I have been searching for a couple of days for a functions which takes 2 arguments a low value and a high value(both of which 64bits ints) than generates a random number between these ranges. The problem I keep encountering is that the number just isn't a 64 bit int. or the number at the edges are more common than the ones in the middle.

Here is some code: it just keeps returning either -1 or 0...

#include<stdio.h>
#include<stdlib.h>
#include<inttypes.h>

int64_t range1=0,range2=18446744073709551614;

int64_t getRandomInRange(int64_t low, int64_t high )
{
    int64_t base_random = rand(); 
    if (RAND_MAX==base_random) return getRandomInRange(low, high);
    int range       = high-low,
        remainder   = RAND_MAX%range,
        bucket      = RAND_MAX/range;
    if (base_random < RAND_MAX-remainder) {
        return low+base_random/bucket;
    } else {
        return getRandomInRange(low, high);
    }
}

int main () {
    int i;
    for (i=0;i<100;i++) {
        printf("random number: %lld\n",getRandomInRange(range1, range2));
    }
}

Upvotes: 0

Views: 2835

Answers (2)

rici
rici

Reputation: 241731

Your code returns either 0 or -1 because 18446744073709551614 is far too large to fit in an int64_t. (In fact, it's slightly too large to fit in a uint64_t, since it is exactly 264, and the largest number that can fit in a k-bit unsigned integer is 2k-1.) So you end up with signed integer overflow. (gcc and clang (at least) warned you about this, even without -Wall.)

At any rate, it is not so difficult to produce the library function you are seeking, providing you have some mechanism to generate random 64-bit unsigned integers. A good option would be the Mersenne Twister library. However, for a demonstration we can use only standard C library functions, in this case lrand48 which produces a uniformly-distributed integer in the range (0, 231-1). Since that range produces only 31 bits of randomness, we'll need to call it several times in order to produce 64 bits.

#define _XOPEN_SOURCE
#include <stdlib.h>
#include <stdint.h>

uint64_t urand64() {
  uint64_t hi = lrand48();
  uint64_t md = lrand48();
  uint64_t lo = lrand48();
  return (hi << 42) + (md << 21) + lo;
}

To get an unbiased sample from the range [low, high), we need to restrict our random number generation to some multiple of high - low. The range of urand64 is of size 264, so we need to exclude modhigh-low264 values. Unfortunately, unless we have an unsigned int longer than 64 bits, we cannot actually compute the modulus directly. However, we can use the identity:

modk(modkm + modkn) = modk(m+n).

In this case, we'll choose m as 264-1 and n as 1, to avoid having to compute modhigh-lown. Also, it's easy to demonstrate that unless k is an exact power of 2, it's impossible for modk264-1 + modk1 to be exactly k, whereas if k is an exact power of 2, the desired modk264 is 0. We can use the following simple test for a power of 2, whose explanation can be found elsewhere:

bool is_power_of_2(uint64_t x) {
  return x == x & -x;
}

So we can define:

uint64_t unsigned_uniform_random(uint64_t low, uint64_t high) {
  static const uint64_t M = ~(uint64_t)0; 
  uint64_t range = high - low;
  uint64_t to_exclude = is_power_of_2(range) ? 0
                                             : M % range + 1;
  uint64_t res;
  // Eliminate `to_exclude` possible values from consideration.
  while ((res = urand64()) < to_exclude) {}
  return low + res % range;
}

Note that in the worst case, the number of values to exclude is 263-1, which is slightly less than half the range of possible values. So, in the worst case, we will require, on average, two calls to urand64 before we find a satisfactory value.

Finally, we need to deal with the fact that we're asked to produce signed integers, rather than unsigned integers. However, that's not a problem because the necessary conversions are well-defined.

int64_t uniform_random(int64_t low, int64_t high) {
  static const uint64_t OFFSET = ((uint64_t)1) << 63;
  uint64_t ulow =  (uint64_t)low + OFFSET;
  uint64_t uhigh = (uint64_t)high + OFFSET;
  uint64_t r = unsigned_uniform_random(ulow, uhigh);
  // Conform to the standard; a good compiler should optimize.
  if (r >= OFFSET) return r - OFFSET;
  else             return (int64_t)r - (int64_t)(OFFSET - 1) - 1;
}

Upvotes: 0

Aki Suihkonen
Aki Suihkonen

Reputation: 20027

Taking a modulo N doesn't lead to uniform distribution, unless N divides the range R exactly:

 rnd = 0..15,  range = 9.

 0 1 2 3 4 5 6 7 8  <-- 0..8 % 9 
 0 1 2 3 4 5 6      <-- 9-15 % 9
----------------------------------
 2 2 2 2 2 2 2 1 1    <-- sum = 16

Likewise, if one tries to avoid that fact by multiplying with e.g. 9 / 16

 rnd = 0..15,   range = 9,   reducing function = rnd * 9 >> 4, one has
 0 1 2 3 4 5 6 7 8    for rnd = 0, 2, 4, 6, 8, 9, 13, 15    and
 0 1 2 3   5 6 7      for rnd = 1, 3, 5, 7, 10, 12, 14
------------------------
 2 2 2 2 1 2 2 2 1     <-- sum = 16

This is so called 'pigeon-hole principle' in action.

One proper way to create uniform distribution of random number is to generate ceil(log2(N)) bits of random number, until the number represented by the bits is less than the range:

 int rand_orig(); // the "original" random function returning values from 0..2^n-1
                  // We assume that n = ceil(log2(N));
 int rand(int N)
 {
     int y;
     do {
          y = rand_orig();
     } while (y >= N);
     return y;
 }

This can be of course improved if the rand_orig(); would return much larger values n >> log(N) in uniform distribution; then it suffices to discard only those values of rand_orig() that are larger than the largest multiple of N and reducing the range with modulo.

Another way would be to create a method that balances the values (N > range) uniformly to all buckets, e.g.

 #define CO_PRIME 1 // Better to have some large prime 2^(n-1) < CO_PRIME < 2^n-1
 int rand_orig();   // some function returning random numbers in range 0..2^n-1
 int rand(int N)    // N is the range
 {
     static int x;
     int y = rand_orig();
     int new_rand = (x + y) % N;
     x = (x + CO_PRIME) % N;
     return new_rand;
 }

Now the period of this balancing term x is N, leading to at least uniform distribution.

Upvotes: 1

Related Questions