Reputation: 1939
I have some problems understanding how numpy objects hashability is managed.
>>> import numpy as np
>>> class Vector(np.ndarray):
... pass
>>> nparray = np.array([0.])
>>> vector = Vector(shape=(1,), buffer=nparray)
>>> ndarray = np.ndarray(shape=(1,), buffer=nparray)
>>> nparray
array([ 0.])
>>> ndarray
array([ 0.])
>>> vector
Vector([ 0.])
>>> '__hash__' in dir(nparray)
True
>>> '__hash__' in dir(ndarray)
True
>>> '__hash__' in dir(vector)
True
>>> hash(nparray)
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
TypeError: unhashable type: 'numpy.ndarray'
>>> hash(ndarray)
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
TypeError: unhashable type: 'numpy.ndarray'
>>> hash(vector)
-9223372036586049780
>>> nparray.__hash__()
269709177
>>> ndarray.__hash__()
269702147
>>> vector.__hash__()
-9223372036586049780
>>> id(nparray)
4315346832
>>> id(ndarray)
4315234352
>>> id(vector)
4299616456
>>> nparray.__hash__() == id(nparray)
False
>>> ndarray.__hash__() == id(ndarray)
False
>>> vector.__hash__() == id(vector)
False
>>> hash(vector) == vector.__hash__()
True
How come
__hash__
method but are however not hashablenumpy.ndarray
defines __hash__
and is hashable?Am I missing something?
I'm using Python 2.7.1 and numpy 1.6.1
Thanks for any help!
EDIT: added objects id
s
EDIT2:
And following deinonychusaur comment and trying to figure out if hashing is based on content, I played with numpy.nparray.dtype
and have something I find quite strange:
>>> [Vector(shape=(1,), buffer=np.array([1], dtype=mytype), dtype=mytype) for mytype in ('float', 'int', 'float128')]
[Vector([ 1.]), Vector([1]), Vector([ 1.0], dtype=float128)]
>>> [id(Vector(shape=(1,), buffer=np.array([1], dtype=mytype), dtype=mytype)) for mytype in ('float', 'int', 'float128')]
[4317742576, 4317742576, 4317742576]
>>> [hash(Vector(shape=(1,), buffer=np.array([1], dtype=mytype), dtype=mytype)) for mytype in ('float', 'int', 'float128')]
[269858911, 269858911, 269858911]
I'm puzzled... is there some (type independant) caching mechanism in numpy?
Upvotes: 13
Views: 8991
Reputation: 3673
This is not a clear answer, but here is some track to follow to understand this behavior.
I refer here to the numpy code of the 1.6.1 release.
According to numpy.ndarray
object implementation (look at, numpy/core/src/multiarray/arrayobject.c
), hash
method is set to NULL
.
NPY_NO_EXPORT PyTypeObject PyArray_Type = {
#if defined(NPY_PY3K)
PyVarObject_HEAD_INIT(NULL, 0)
#else
PyObject_HEAD_INIT(NULL)
0, /* ob_size */
#endif
"numpy.ndarray", /* tp_name */
sizeof(PyArrayObject), /* tp_basicsize */
&array_as_mapping, /* tp_as_mapping */
(hashfunc)0, /* tp_hash */
This tp_hash
property seems to be overridden in numpy/core/src/multiarray/multiarraymodule.c
. See DUAL_INHERIT
, DUAL_INHERIT2
and initmultiarray
function where tp_hash
attribute is modified.
Ex: PyArrayDescr_Type.tp_hash = PyArray_DescrHash
According to hashdescr.c
, hash is implemented as follow:
* How does this work ? The hash is computed from a list which contains all the
* information specific to a type. The hard work is to build the list
* (_array_descr_walk). The list is built as follows:
* * If the dtype is builtin (no fields, no subarray), then the list
* contains 6 items which uniquely define one dtype (_array_descr_builtin)
* * If the dtype is a compound array, one walk on each field. For each
* field, we append title, names, offset to the final list used for
* hashing, and then append the list recursively built for each
* corresponding dtype (_array_descr_walk_fields)
* * If the dtype is a subarray, one adds the shape tuple to the list, and
* then append the list recursively built for each corresponding type
* (_array_descr_walk_subarray)
Upvotes: 2
Reputation: 3231
I get the same results in Python 2.6.6 and numpy 1.3.0. According to the Python glossary, an object should be hashable if __hash__
is defined (and is not None
), and either __eq__
or __cmp__
is defined. ndarray.__eq__
and ndarray.__hash__
are both defined and return something meaningful, so I don't see why hash
should fail. After a quick google, I found this post on the python.scientific.devel mailing list, which states that arrays have never been intended to be hashable - so why ndarray.__hash__
is defined, I have no idea. Note that isinstance(nparray, collections.Hashable)
returns True
.
EDIT: Note that nparray.__hash__()
returns the same as id(nparray)
, so this is just the default implementation. Maybe it was difficult or impossible to remove the implementation of __hash__
in earlier versions of python (the __hash__ = None
technique was apparently introduced in 2.6), so they used some kind of C API magic to achieve this in a way that wouldn't propagate to subclasses, and wouldn't stop you from calling ndarray.__hash__
explicitly?
Things are different in Python 3.2.2 and the current numpy 2.0.0 from the repo. The __cmp__
method no longer exists, so hashability now requires __hash__
and __eq__
(see Python 3 glossary). In this version of numpy, ndarray.__hash__
is defined, but it is just None
, so cannot be called. hash(nparray)
fails andisinstance(nparray, collections.Hashable)
returns False
as expected. hash(vector)
also fails.
Upvotes: 8