Bruce
Bruce

Reputation: 235

How to smooth a random distribution?

I'm trying to randomize the appearance of characters in a game but use their name as the seed. So if you ever meet "Bob" in the game, he'll always have the same hair / eyes, etc.

Right now I'm just generating a number from their name (by adding all the character codes) and then using modulus to decide what options they have.

Example: Bob's seed is 276 (66 + 111 + 98). 276 % the number of hair styles (40) results in 36.

That works fine but for a list of 350+ names, the distribution looks like this:

hair style: 0 / # of people using it: 15
hair style: 1 / # of people using it: 8
hair style: 2 / # of people using it: 4
hair style: 3 / # of people using it: 5
hair style: 4 / # of people using it: 7
hair style: 5 / # of people using it: 5
hair style: 6 / # of people using it: 7
hair style: 7 / # of people using it: 14
hair style: 8 / # of people using it: 12
hair style: 9 / # of people using it: 6
hair style: 10 / # of people using it: 7
hair style: 11 / # of people using it: 2
hair style: 12 / # of people using it: 7
hair style: 13 / # of people using it: 10
hair style: 14 / # of people using it: 11
hair style: 15 / # of people using it: 7
hair style: 16 / # of people using it: 12
hair style: 17 / # of people using it: 7
hair style: 18 / # of people using it: 6
hair style: 19 / # of people using it: 10
hair style: 20 / # of people using it: 5
hair style: 21 / # of people using it: 10
hair style: 22 / # of people using it: 11
hair style: 23 / # of people using it: 3
hair style: 24 / # of people using it: 6
hair style: 25 / # of people using it: 8
hair style: 26 / # of people using it: 5
hair style: 27 / # of people using it: 11
hair style: 28 / # of people using it: 10
hair style: 29 / # of people using it: 6
hair style: 30 / # of people using it: 13
hair style: 31 / # of people using it: 11
hair style: 32 / # of people using it: 10
hair style: 33 / # of people using it: 12
hair style: 34 / # of people using it: 3
hair style: 35 / # of people using it: 11
hair style: 36 / # of people using it: 9
hair style: 37 / # of people using it: 4
hair style: 38 / # of people using it: 10
hair style: 39 / # of people using it: 15

The distribution isn't very smooth, it's all over the place (unsurprisingly). I'm going to run into a lot of people with hair style #0 and next to no one with hair style #11.

How can I smooth this out a bit?

Upvotes: 3

Views: 448

Answers (1)

rob mayoff
rob mayoff

Reputation: 385870

You might have better luck if you use a real hash function instead of just summing the ASCII/Unicode code points. The djb2 function is quite popular for ASCII input:

unsigned long hash(unsigned char *str) {
    unsigned long hash = 5381;
    int c;

    while (c = *str++)
        hash = ((hash << 5) + hash) + c; /* hash * 33 + c */

    return hash;
}

So, for example,

unsigned long hairStyle = hash(characterName) % kHairStyleCount;

See also What's a good hash function for English words?

Upvotes: 4

Related Questions