Aart Stuurman
Aart Stuurman

Reputation: 3598

Hash integer array

I am using a hash set in which I store array of integers(32 bits). This means I need an algorithm to hash an array of integers. I am looking for a 32 bits integer(C# int) hash.

I have tried and edited two existing algorithms, which you can see the four versions of at the bottom, including their benchmark.

My questions are as follows:

1. Do you think the bottom algorithm is good for this purpose?

2. Is there a better algorithm available for this purpose?

Program information

Benchmarks and code

below are my benchmarks and code, from worst to best performance in my program.

MurMurHash3 using bytes retrieved from the coordinates directly

Code is equal to https://gist.github.com/automatonic/3725443 The array of bytes is retrieved using the following code:

int size = Marshal.SizeOf(typeof(Coordinates2D));
int length = carCoords.Length;
Byte[] bytes = new Byte[size * length];
for (int i = 0; i < length; ++i)
{
    GCHandle pinStructure = GCHandle.Alloc(carCoords[i], GCHandleType.Pinned);
    Marshal.Copy(pinStructure.AddrOfPinnedObject(), bytes, i*size, size);
    pinStructure.Free();
}

// Hash the byte array
return MurMurHash3.Hash(new System.IO.MemoryStream(bytes));

This is incredibly inefficient, because of the copying.

MurMurHash3 using bytes retrieved from the integers in the objects

public static int Hash2(RushHourPathLengthNode.Coordinates2D[] coords)
{
    const uint c1 = 0xcc9e2d51;
    const uint c2 = 0x1b873593;

    uint h1 = seed;
    uint k1 = 0;
    uint streamLength = (uint)coords.Length * 2;

    for (int i = 0, l = coords.Length; i < l; ++i)
    {
        // Do it for X
        byte[] chunk = BitConverter.GetBytes(coords[i].x);

        /* Get four bytes from the input into an uint */
        k1 = (uint)
           (chunk[0]
          | chunk[1] << 8
          | chunk[2] << 16
          | chunk[3] << 24);

        /* bitmagic hash */
        k1 *= c1;
        k1 = rotl32(k1, 15);
        k1 *= c2;

        h1 ^= k1;
        h1 = rotl32(h1, 13);
        h1 = h1 * 5 + 0xe6546b64;


        // Do it for y
        chunk = BitConverter.GetBytes(coords[i].y);

        /* Get four bytes from the input into an uint */
        k1 = (uint)
           (chunk[0]
          | chunk[1] << 8
          | chunk[2] << 16
          | chunk[3] << 24);

        /* bitmagic hash */
        k1 *= c1;
        k1 = rotl32(k1, 15);
        k1 *= c2;

        h1 ^= k1;
        h1 = rotl32(h1, 13);
        h1 = h1 * 5 + 0xe6546b64;
    }

    // finalization, magic chants to wrap it all up
    h1 ^= streamLength;
    h1 = fmix(h1);

    unchecked //ignore overflow
    {
        return (int)h1;
    }
}

This is alot more efficient now the copying is gone.

MurMurHash3 using integers

public static int Hash(RushHourPathLengthNode.Coordinates2D[] coords)
{
    const uint c1 = 0xcc9e2d51;
    const uint c2 = 0x1b873593;

    uint h1 = seed;
    uint k1 = 0;
    uint streamLength = (uint)coords.Length * 2;

    for (int i = 0, l = coords.Length; i < l; ++i)
    {
        k1 = (uint)coords[i].x;

        //bitmagic hash
        k1 *= c1;
        k1 = rotl32(k1, 15);
        k1 *= c2;

        h1 ^= k1;
        h1 = rotl32(h1, 13);
        h1 = h1 * 5 + 0xe6546b64;

        k1 = (uint)coords[i].y;

        //bitmagic hash
        k1 *= c1;
        k1 = rotl32(k1, 15);
        k1 *= c2;

        h1 ^= k1;
        h1 = rotl32(h1, 13);
        h1 = h1 * 5 + 0xe6546b64;
    }

    // finalization, magic chants to wrap it all up
    h1 ^= streamLength;
    h1 = fmix(h1);

    unchecked //ignore overflow
    {
        return (int)h1;
    }
}

Hash using integer addition multiplication

int hash = 17;
for (int i = 0, l = carCoords.Length; i < l; ++i)
{
    hash = hash * 31 + carCoords[i].x;
    hash = hash * 31 + carCoords[i].y;
}
return hash;

As you see, this one is far more efficient. It works well with any prime numbers. As I understand, there is no scientific proof of this to work, which I am not too fond of.

According to Michal B. a faster version would be using bitshifting. However, testing shows that this is not a successful hash. The problem takes significantly longer to run(It did not finish within 5 minutes). The bitshifting might be good, but it seems like the 31(prime number) is crucial.

int hash = 17;
for (int i = 0, l = carCoords.Length; i < l; ++i)
{
    hash = hash << 5 - carCoords[i].x;
    hash = hash << 5 - carCoords[i].y;
}
return hash;

Upvotes: 9

Views: 8707

Answers (2)

Aart Stuurman
Aart Stuurman

Reputation: 3598

In the end I went with the last algorithm.

int hash = 17;
for (int i = 0, l = carCoords.Length; i < l; ++i)
{
    hash = hash * 19 + carCoords[i].x;
    hash = hash * 19 + carCoords[i].y;
}
return hash;

This is very fast to compute, and for the (small) numbers I am using the hash is awesome.

If you are going to use this, make sure the numbers you use are prime numbers. Because of this you cannot use bitshifting to optimize it.

Upvotes: 4

Ani
Ani

Reputation: 10896

Have you considered using a space-filling curve to generate the hash? This will minimize (or eliminate) collisions for the chosen resolution (maxX, maxY)

Here are two SO questions and their answers that use this method.

  1. Mapping N-dimensional value to a point on Hilbert curve
  2. Calculate the Hilbert value of a point for use in a Hilbert R-Tree?

Hope this helps!

Upvotes: 3

Related Questions