Andy
Andy

Reputation: 1090

Hashing to uniformly distribute value over a large range

I want to devise an algorithm which takes a set of values and distributes it uniformly over a much larger range. eg. i have 1000 values and want to distribute them over a range of value 2^16. Also, the input values can change continuously and i need to keep parsing each input value through the hash function so that it gets distributed uniformly over my output range.

What hashing algorithm should i use for this? I am writing the code in Java.

Upvotes: 3

Views: 3182

Answers (2)

MikeAinOz
MikeAinOz

Reputation: 128

I'm sure this has a name, but this is what we used to do with ISAM files back in the dark ages

  1. Increment a number eg 16001
  2. Reverse the String ie. 10061 and you have your hash
  3. You might want to reverse the string bitwise

This produces a nice even spread. we used to use it with job numbers so that you could retrieve the job fairly easily, so if you have a 'magic number' candidate this can be useful.

Upvotes: 1

Tony Ennis
Tony Ennis

Reputation: 12309

If you're just hashing integers, here's one way.

public class Hasho {

    private static final Long LARGE_PRIME =  948701839L;
    private static final Long LARGE_PRIME2 = 6920451961L;

    public static void main(String[] args) {
        for (int i = 0; i < 100; i++) {
            System.out.println(i + " -> " + hash(i));
        }
    }

public static int hash(int i) {
    // Spread out values
    long scaled = (long) i * LARGE_PRIME;

    // Fill in the lower bits
    long shifted = scaled + LARGE_PRIME2;

    // Add to the lower 32 bits the upper bits which would be lost in
    // the conversion to an int.
    long filled = shifted + ((shifted & 0xFFFFFFFF00000000L) >> 32);

    // Pare it down to 31 bits in this case.  Replace 7 with F if you
    // want negative numbers or leave off the `& mask` part entirely.
    int masked = (int) (filled & 0x7FFFFFFF);
    return masked;
    }
}

This is merely an example to show how it can be done. There is some serious math in a professional quality hash function.

Upvotes: 2

Related Questions