Reputation: 1155
When hash()
method is called in Python 3, I noticed that it doesn't return a long-length integer when taking in int
data type but with string
type.
Is this supposed to work this way? If that actually is the case, for the int
type to have a short hash value, won't it cause collision since it's too short?
for i in [i for i in range(5)]:
print(hash(i))
print(hash("abc"))
The Result:
0
1
2
3
4
4714025963994714141
Upvotes: 0
Views: 3758
Reputation: 4377
In CPython, default Python interpreter implementation, built-in hash
is done in this way:
For numeric types, the hash of a number x is based on the reduction of x modulo the prime P = 2**_PyHASH_BITS - 1. It's designed so that hash(x) == hash(y) whenever x and y are numerically equal, even if x and y have different types
_PyHASH_BITS
is 61
(64-bit systems) or 31
(32-bit systems)(defined here)
So on 64-bit system built-in hash
looks like this function:
def hash(number):
return number % (2 ** 61 - 1)
That's why for small ints you got the same values, while for example hash(2305843009213693950)
returns 2305843009213693950
and hash(2305843009213693951)
returns 0
Upvotes: 9
Reputation: 147
You should use hashlib module:
>>> import hashlib()
>>> m.update(b'abc')
>>> m.hexdigest()
Upvotes: 0
Reputation: 530823
The only purpose of the hash
function is to produce an integer value that can be used to insert an object into a dict. The only thing hash
guarantees is that if a == b
, then hash(a) == hash(b)
. For a user-defined class Foo
, it is the user's responsibility to ensure that Foo.__eq__
and Foo.__hash__
enforce this guarantee.
Anything else is implementation-dependent, and you shouldn't read anything into the value of hash(x)
for any value x
. Specifically, hash(a) == hash(b)
is allowed for a != b
, and hash(x) == x
is not required for any particular x
.
Upvotes: 4