Sam Turani
Sam Turani

Reputation: 11

Implementing a hash function in Python

I have a set of 100,000 IDs I need to hash into an array that has 50 buckets.

The IDs are of the form: AA00000...AA99999. I know there are functions like md5 available, but those produce digests not indexes into an array. How could I implement a hash such that for each ID it returns an index into my array?

Thanks in advance. (Implementing this in Python)

Upvotes: 0

Views: 114

Answers (1)

Martijn Pieters
Martijn Pieters

Reputation: 1121176

Just use the built-in hash() function, and use modulus to limit the result to 50:

hash(yourid) % 50

From the documentation:

Return the hash value of the object (if it has one). Hash values are integers.

For your given inputs, this picks slots uniformly enough:

>>> from collections import Counter
>>> histogram = Counter(hash('AA{:05d}'.format(i)) % 50 for i in range(100000))
>>> for i in range(50):
...     print '{:4d}: {}'.format(histogram[i], '*' * (histogram[i] // 40))
... 
1932: ************************************************
1932: ************************************************
1941: ************************************************
1941: ************************************************
1908: ***********************************************
1908: ***********************************************
1974: *************************************************
1974: *************************************************
2012: **************************************************
2012: **************************************************
1898: ***********************************************
1898: ***********************************************
1954: ************************************************
1954: ************************************************
1925: ************************************************
1925: ************************************************
1995: *************************************************
1995: *************************************************
1982: *************************************************
1982: *************************************************
2023: **************************************************
2023: **************************************************
2025: **************************************************
2025: **************************************************
2070: ***************************************************
2070: ***************************************************
2042: ***************************************************
2042: ***************************************************
2028: **************************************************
2028: **************************************************
2120: *****************************************************
2120: *****************************************************
2064: ***************************************************
2064: ***************************************************
2100: ****************************************************
2100: ****************************************************
2057: ***************************************************
2057: ***************************************************
2039: **************************************************
2039: **************************************************
1981: *************************************************
1981: *************************************************
1956: ************************************************
1956: ************************************************
2000: **************************************************
2000: **************************************************
1982: *************************************************
1982: *************************************************
1992: *************************************************
1992: *************************************************

Upvotes: 2

Related Questions