Reputation: 362786
The docs incorrectly claim that
Objects which are instances of user-defined classes are hashable by default; they all compare unequal (except with themselves), and their hash value is their
id()
Although I recall this being correct once, such objects hashing equal to their id is apparently not true in current versions of python (v2.7.10, v3.5.0).
>>> class A:
... pass
...
>>> a = A()
>>> hash(a)
-9223372036578022804
>>> id(a)
4428048072
In another part of the docs it's said that the hash is derived from the id. When/why did the implementation change, and how is the number returned by hash "derived from" the id now?
Upvotes: 4
Views: 1453
Reputation: 1122222
This changed in 2009, as a result of issue #5186; the usual id()
values caused too many collisions:
In the issue 5169 discussion, Antoine Pitrou suggested that for an object
x without a `__hash__` method, `id()/8` might be a better hash value than
`id()`, since dicts use the low order bits of the hash as initial key, and
the 3 lowest bits of an `id()` will always be zero.
The current implementation takes the id and rotates it to produce a more varying value:
long
_Py_HashPointer(void *p)
{
long x;
size_t y = (size_t)p;
/* bottom 3 or 4 bits are likely to be 0; rotate y by 4 to avoid
excessive hash collisions for dicts and sets */
y = (y >> 4) | (y << (8 * SIZEOF_VOID_P - 4));
x = (long)y;
if (x == -1)
x = -2;
return x;
}
This resulted in a 14% to 34% speedup, depending on the test performed.
The glossary is simply out of date; I see you already opened an issue with the project.
Upvotes: 2
Reputation: 2884
The relevant function appears to be:
Py_hash_t
_Py_HashPointer(void *p)
{
Py_hash_t x;
size_t y = (size_t)p;
/* bottom 3 or 4 bits are likely to be 0; rotate y by 4 to avoid
excessive hash collisions for dicts and sets */
y = (y >> 4) | (y << (8 * SIZEOF_VOID_P - 4));
x = (Py_hash_t)y;
if (x == -1)
x = -2;
return x;
}
(that code comes from here, and is then used to be the tp_hash
slot in type
here.) The comment there seems to give a reason for not using the pointer (which is the same thing as the id
) directly. Indeed, the commit that introduced that change to the function is here, and states that the reason for the change is:
Issue #5186: Reduce hash collisions for objects with no hash method by rotating the object pointer by 4 bits to the right.
which refers to this issue, which explains more why the change was made.
Upvotes: 6