Reputation: 443
I noticed that hashcodes I got from other objects were different when I built for a either x86 or x64. Up until now I have implemented most of my own hashing functions like this:
int someIntValueA;
int someIntValueB;
const int SHORT_MASK = 0xFFFF;
public override int GetHashCode()
{
return (someIntValueA & SHORT_MASK) + ((someIntValueB & SHORT_MASK) << 16);
}
Will storing the values in a long and getting the hashcode from that give me a wider range as well on 64-bit systems, or is this a bad idea?
public override int GetHashCode()
{
long maybeBiggerSpectrumPossible = someIntValueA + (someIntValueB << 32);
return maybeBiggerSpectrumPossible.GetHashCode();
}
Upvotes: 1
Views: 347
Reputation: 113352
The approach you suggest won't make anything any better (quite the opposite).
However…
SpookyHash is for example designed to work particularly quickly on 64-bit systems, because when working out the math the author was thinking about what would be fast on a 64-bit system, xxHash has 32-bit and 64-bit variants that are designed to give comparable quality of hash at better speed for 32-bit and 64-bit computation respectively.
The general idea of making use of the differences performances of different arithmetic operations on different machines is a valid one.
And your general idea of making use of a larger intermediary storage in hash calculation is also a valid one as long as those extra bits make their way into subsequent operations.
So at a very general level, the answer is yes, even if your particular implementation fails to come through with that.
Now, in practice, when you're sitting down to write a hashcode implementation should you worry about this?
Well it depends. For a while I was very bullish about using algorithms like SpookyHash, and it does very well (even on 32-bit systems) when the hash is based on a large amount of source data. But on the other hand it can be better, especially when used with smaller hash-based sets and dictionaries, to be crappy really fast than fantastic slowly. So there isn't an one-solution-fits-all answer. With just two input integers your initial solution is likely to beat a super-avalancy algorithm like xxHash or SpookyHash for many uses. You could perhaps do better if you also had a >> 16
to rotate rather than shift (fun fact, some jitters are optimised for that), but we're not touching on 64- vs 32-bit versions in that at all.
The cases where you do find a big possible improvement with taking a different approach in 64- and 32-bit are where there's a large amount of data to mix in, especially if it's in a blittable form (like string
or byte[]
) that you can access via a long*
or int*
depending on framework.
So, generally you can ignore the question of bitness, but if you find yourself thinking "this hashcode has to go through so much stuff to get an answer; can I make it better?" then maybe it's time to consider such matters.
Upvotes: 1
Reputation: 660455
No, that will be far worse.
Suppose your int values are typically in the range of a short: between -30000 and +30000. And suppose further that most of them are near the middle, say, between 0 and 1000. That's pretty typical. With your first hash code you get all the bits of both ints into the hash code and they don't interfere with each other; the number of collisions is zero under typical conditions.
But when you do your trick with a long, then you rely on what the long implementation of GetHashCode does, which is xor the upper 32 bits with the lower 32 bits. So your new implementation is just a slow way of writing int1 ^ int2
. Which, in the typical scenario has almost all zero bits, and hence collisions all over the place.
Upvotes: 7