zym1010
zym1010

Reputation: 43

numpy dtype in membership test gives weird results

While writing some program involving numpy, I found that membership test doesn't work as expected for numpy dtype objects. Specifically, the result is unexpected for set, but not list or tuple.

import numpy as np
x = np.arange(5).dtype
y = np.int64
print(x in {y}, x in (y,), x in [y])

the result is False True True.

found this in both Python 2.7 and 3.6, with numpy 1.12.x installed.

Any idea why?

UPDATE

looks that dtype objects don't respect some assumptions about hashing in Python.

http://www.asmeurer.com/blog/posts/what-happens-when-you-mess-with-hashing-in-python/

and https://github.com/numpy/numpy/issues/5345

Thanks @ser2357112 and @Fabien

Upvotes: 1

Views: 91

Answers (2)

user2357112
user2357112

Reputation: 280733

The __hash__ and __eq__ implementations of dtype objects were pretty poorly thought out. Among other problems, the __hash__ and __eq__ implementations aren't consistent with each other. You're seeing the effects of that here.

Some other problems with dtype __hash__ and __eq__ are that

  • dtype objects are actually mutable in ways that affect both __hash__ and __eq__, something that should never be true of a hashable object. (Specifically, you can reassign the names of a structured dtype.)
  • dtype equality isn't transitive. For example, with the x and y in your question, we have x == y and x == 'int64', but y != 'int64'.
  • dtype __eq__ raises TypeError when it should return NotImplemented.

You could submit a bug report, but looking at existing bug reports relating to those methods, it's unlikely to be fixed. The design is too much of a mess, and people are already relying on the broken parts.

Upvotes: 2

Fabien
Fabien

Reputation: 4972

The difference lies in how sets implement the in keyword in Python.

Lists simply examine each object, checking for equality. Sets first hash the objects.

different meaning of the 'in' keyword for sets and lists

This is because sets must ensure uniqueness. But your objects are not equivalent:

>>> x
dtype('int64')
>>> y
<class 'numpy.int64'>

Hashing them probably delivers different results.

Upvotes: 0

Related Questions