Reputation: 13
I have a dataclass that represents a 2D point:
@dataclass
class Point:
x: int
y: int
I want the Point
class to have the following behaviour, so that different point objects can be compared based on value, but also stored separately in a dictionary:
p1 = Point(5, 10)
p2 = Point(5, 10)
p1 == p2 # Should return True
p1 is p2 # Should return False
hash(p1) == hash(p2) # Should return False, so that they can be stored as different entries in a dict
I could use unsafe_hash=True
, e.g.
@dataclass(unsafe_hash=True)
class Point:
x: int
y: int
But this will cause problems when the points are stored in a dictionary. E.g.
p1 = Point(5, 10)
p2 = Point(5, 10)
d = {p1: True, p2: False}
d # Returns {Point(x=5, y=10): False}
Is there any reason why I shouldn't instead implement the __hash__
method to return a hash based on the id of the object? Similar to the implementation of object.__hash__
(ref).
@dataclass
class Point:
x: int
y: int
def __hash__(self):
return hash(id(self))
This seems like the simplest solution, and gives the class the desired behaviour, but it seems to go against the advice in the Python docs:
The only required property is that objects which compare equal have the same hash value
Upvotes: 1
Views: 501
Reputation: 70267
You're right that that __hash__
goes against the recommendations in the docs. In fact, it's more than a recommendation; Python's built-in data structures are allowed to rely on that assumption. If you have two equal objects which hash to different values, then indexing into a dictionary whose keys are those points could return either point, depending entirely on how the dictionary is stored internally. Now, the current Python implementation may or may not actually do this, but other Python implementations could, and a future version of Python might break your code. Bottom line: It's a bad idea.
Let me propose a different idea. You want two different things, so consider making two different classes. One class can represent the concrete notion of identity (both in __hash__
and in __eq__
, for consistency). Then have a separate class for the point-to-point equality.
class PointObject:
point: Point
def __init__(self, point: Point) -> None:
self.point = point
@dataclass(frozen=True)
class Point:
x: int
y: int
Now your hash can have PointObject
as keys, but you can do all of your business logic on the contained Point
objects. Obviously, you might pick a better name than "PointObject", depending on what they actually represent in your domain model.
Upvotes: 1