Generating a unique Hashcode based on Doubles with several decimal places

Question

I have a custom object that we'll call "MyObject" It has three major properties called X,Y, and Z that determine whether it's unique or not. I have a HashSet containing 400,000 "MyObject"s in a HashSet. My original solution for generating a unique hashcode was simple and fast.

return Convert.ToInt32(X * 76 + Y * 100 + Z * 23);

However, the integer generated from this was not unique enough. With the current HashCode, these two points match, even though the Y is slightly different.

X: 392598.200000000190 Y: 4935367.900000000400

X: 392598.200000000190 Y: 4935367.900580000100

What I've tried:

double value = (X * 101 + Y * 89 + Z * 56);
return value.GetHashCode();

Extremely accurate, with 1 - 10,000 records, it only takes a few seconds to compute the differences. However, with 400,000 records, it bogs down hard. I let it run for 17 hours, and it still hadn't returned my results.
Converting to a string, then getting hashcode of the string. Precise, but uselessly slow.
Increasing the multipliers of X, Y and Z. The number generated becomes too large. I tried using the method used here: http://msdn.microsoft.com/en-us/library/system.object.gethashcode.aspx
```
return ((int)value ^ (int)(value >> 32));
```

However it doesn't allow for anymore integers. I also worry that even if I increase the size, it may becomes uselessly slow like my other solutions.

I can't do additional checks if it matches, since 390,000 of 400,000 records will likely match

What is the best solution? Or is there a way to make my two already precise operations significantly faster? I was thinking about removing all zeros from the values after the decimal place until it meets a non-zero, and then using my original logic, ie (45.0002030 would become 45.2030)

Matthew Watson · Accepted Answer

You can easily calculate a reasonable hash code from several objects like so:

public override int GetHashCode()
{
    int hash = 17;

    hash = hash * 23 + X.GetHashCode();
    hash = hash * 23 + Y.GetHashCode();
    hash = hash * 23 + Z.GetHashCode();

    return hash;
}

You can add as many hash codes to this as you like as you add new fields to your class that must contribute to the hash code.

This is generally a fast operation.

Also note that if you have immutable types, you can speed things up by either calculating the hash code in the immutable type's constructor or by lazily calculating it on demand (and then caching the result).

[EDIT]

Where you saw your code slowing down, are you sure that wasn't because you were getting a lot of hashcode collisions rather than the hashcode calculation itself being too slow?

For example, if you just returned 0 for every hash code, it would be very fast but adding to the hash collection would be extremely slow after a while.

I would expect the time taken to calculate the hash codes like this would be dwarfed by the amount of time taken actually adding the items to the collection.

[Second Edit]

The implementation of double.GetHashCode() (obtained via Reflector) is:

public override unsafe int GetHashCode()
{
    double num = this;
    if (num == 0.0)
    {
        return 0;
    }
    long num2 = *((long*) &num);
    return (((int) num2) ^ ((int) (num2 >> 32)));
}

which looks pretty quick to me.

Generating a unique Hashcode based on Doubles with several decimal places

Answers (1)

Related Questions