Dmitry Minkovsky
Dmitry Minkovsky

Reputation: 38143

Generating a uniform distribution of 64-bit integers

I'm using

var crypto = require('crypto');
var uid = crypto.pseudoRandomBytes(8);

to generated 64-bit IDs.

I don't need these to be truly random, but I would like them to be uniformly distributed between 0 and ffffffffffffffff. However, they are not. Generating 1,000,000 64-bit integers this way:

var bignum = require('bignum');
var crypto = require('crypto');

var randomBignum = function() { 
    return bignum.fromBuffer(crypto.pseudoRandomBytes(8));
}

var avg = randomBignum();
var avg_len = avg.bitLength();
var lengths = {};

for (var i = 0; i < 1e6; i++) {
    var big = randomBignum();
    avg = avg.add(big).div(2);
    var length = big.bitLength();
    if (typeof lengths[length] === 'undefined') { lengths[length] = 0; }
    lengths[length] += 1;
    avg_len = (avg_len + length) / 2;
}
console.log('Average integer: %s', avg);
console.log('Average bit length: %s', avg_len);
console.info('Bit length distribution: %s', require('util').inspect(lengths));

produces:

Average integer: 8386866841744540769
Average bit length: 63.11672078688376
Bit length distribution: {
  '45': 1,
  '47': 2,
  '48': 7,
  '49': 15,
  '50': 23,
  '51': 66,
  '52': 114,
  '53': 248,
  '54': 521,
  '55': 955,
  '56': 1905,
  '57': 4019,
  '58': 7742,
  '59': 15552,
  '60': 31396,
  '61': 63048,
  '62': 125182,
  '63': 249271,
  '64': 499933 }

I may be confused about the stats here, but these are not uniformly distributed, right? The fact that each byte is individually randomly generated makes it improbable that successive 0-bytes will be generated, so you get exponentially fewer small numbers.

What is a good and fast way to generate uniformly distributed, 64-bit IDs in Node? Should I use Math.random and multiply the result by ffffffffffffffff?

Upvotes: 1

Views: 778

Answers (1)

Ry-
Ry-

Reputation: 224921

They (probably) are uniform! The probability of getting 0 is the same as the probability of getting 18446744073709551615, but there are 4611686018427387904 numbers in that range with no leading zeroes, and only one with 64 of them. You should expect to see each bit count appearing approximately twice as much as the previous one, which your test agrees with.

Upvotes: 3

Related Questions