Madhur Ahuja
Madhur Ahuja

Reputation: 22709

Generate consistent hash of a UUID

I want to generate a consistent hash of a UUID string such as dcc549d8-bd0c-49c2-bff8-f6fa80fb7857, preferably a number between 0 to N.

What is the best and fastest way to do it?

Update: I am thinking of using CRC32. Any pros/cons of it?

Upvotes: 6

Views: 11059

Answers (3)

id01
id01

Reputation: 1597

If you're using v4 UUID generation, all digits except for two already contain pseudorandom values, all you need to do is extract them to the form you want.

From the Spec:

4.4.  Algorithms for Creating a UUID from Truly Random or
      Pseudo-Random Numbers

   The version 4 UUID is meant for generating UUIDs from truly-random or
   pseudo-random numbers.

   The algorithm is as follows:

   o  Set the two most significant bits (bits 6 and 7) of the
      clock_seq_hi_and_reserved to zero and one, respectively.

   o  Set the four most significant bits (bits 12 through 15) of the
      time_hi_and_version field to the 4-bit version number from
      Section 4.1.3.

   o  Set all the other bits to randomly (or pseudo-randomly) chosen
      values.

To extract the pseudorandom values, we simply remove the first hex character (which contains the 4 most significant bits) of the third and fourth sections separated by "-" and convert it to the base you want. Because a UUID is always the same length, we can remove the same indexes each time (14 and 19).

Unfortunately, as Javascript only supports up to 32-bit integers, we need to feed Number.parseInt groups of 8 hex characters (32 bits) separately, then add the modulos to minimize bias.

Therefore, our code would be:

function parseUUID(uuid, N) {
    var pseudorandomBytes = uuid.substring(0, 14) + uuid.substring(15, 19) + uuid.substring(20); // Splice out the bytes which contain metadata
    pseudorandomBytes = pseudorandomBytes.replace(/-/g, ""); // Remove all dashes
    var accumulator = 0; // Accumulate the sums modulo N of every eight hex characters and the remainder
    pseudorandomBytes.match(/.{1,8}/g).forEach(function (a) { accumulator = (accumulator + (Number.parseInt(a, 16) % N)) % N; });
    return accumulator; // Return the result
}

Upvotes: 1

Yaroslav Gaponov
Yaroslav Gaponov

Reputation: 2099

If thinking about the nature of UUID example. I would go in that direction.

const INIT_NUMBER = 271;

function hash(uuid, N) {
  const x = uuid.split("-").reduce((a,b) => a ^ Number.parseInt(b, 16), INIT_NUMBER) ;
  return arguments.length === 1 ? x : x % N;
}

const a = hash("dcc549d8-bd0c-49c2-bff8-f6fa80fb7857");            
const b = hash("dcc549d8-bd0c-49c2-bff8-f6fa80fb7857", 256);

console.log(a, b);
  
  

Upvotes: 5

jack
jack

Reputation: 622

What kind of hash would you like? The 'best' choice might not be fastest and will depend on what you're using the hash for.

For md5, you could do:

var crypto = require('crypto');

var md5sum = crypto.createHash('md5');
md5sum.update(uuid);
var b64 = md5sum.digest('base64')

You could then use a base64 library to convert it to a number if that's what you need.

Node crypto stuff, including other hashing algorithms that might be more appropriate for your case (md5 is faster but less secure), is documented here: https://nodejs.org/api/crypto.html

Upvotes: 6

Related Questions