Reputation: 110422
Is the hash()
function in python guaranteed to always be the same for a given input, regardless of when/where it's entered? So far -- from trial-and-error only -- the answer seems to be yes, but it would be nice to understand the internals of how this works. For example, in a test:
$ python
>>> from ingest.tpr import *
>>> d=DailyPriceObj(date="2014-01-01")
>>> hash(d)
5440882306090652359
>>> ^D
$ python
>>> from ingest.tpr import *
>>> d=DailyPriceObj(date="2014-01-01")
>>> hash(d)
5440882306090652359
Upvotes: 2
Views: 3020
Reputation: 180
A necessary condition for hashability is that for equivalent objects this value is always the same (inside one run of the interpreter).
Of course, nothing prevents you from ignoring this requirement. But, if you suddently want to store your objects in a dictionary or a set, then problems may arise.
When you implement your own class you can define methods __eq__
and __hash__
. I have used a polynominal hash function for strings and a hash function from a universal family of hash functions.
In general, the hash values for a specific object shouldn't change from start to start interpreter. But for many data types this is true. One of the reasons for this implementation is that it is more difficult to find and anti-hash test.
For numeric types, the hash of a number
x
is based on the reduction ofx
modulo the primeP = 2**_PyHASH_BITS - 1
. It's designed so thathash(x) == hash(y)
wheneverx
andy
are numerically equal, even ifx
andy
have different types.
a = 123456789
hash(a) == 123456789
hash(a + b * (2 ** 61 - 1)) == 123456789
Upvotes: 1
Reputation: 794
No, it's dependent on the process. If you need a persistent hash, see Persistent Hashing of Strings in Python.
Truncation depending on platform, from the documentation of __hash__:
hash() truncates the value returned from an object’s custom __hash__() method to the size of a Py_ssize_t. This is typically 8 bytes on 64-bit builds and 4 bytes on 32-bit builds.
Salted hashes, from the same documentation (ShadowRanger's answer):
By default, the __hash__() values of str, bytes and datetime objects are “salted” with an unpredictable random value. Although they remain constant within an individual Python process, they are not predictable between repeated invocations of Python. This is intended to provide protection against a denial-of-service caused by carefully-chosen inputs that exploit the worst case performance of a dict insertion, O(n^2) complexity. See http://www.ocert.org/advisories/ocert-2011-003.html for details.
Upvotes: 1
Reputation: 155546
The contract for the __hash__
method requires that it be consistent within a given run of Python. There is no guarantee that it be consistent across different runs of Python, and in fact, for the built-in str
, bytes
-like types, and datetime.datetime
objects (possibly others), the hash is salted with a per-run value so that it's almost never the same for the same input in different runs of Python.
Upvotes: 8