Rob
Rob

Reputation: 13

Python dataclasses: is it safe to use an id-based hash instead of `unsafe_hash=True`?

I have a dataclass that represents a 2D point:

@dataclass
class Point:
    x: int
    y: int

I want the Point class to have the following behaviour, so that different point objects can be compared based on value, but also stored separately in a dictionary:

p1 = Point(5, 10)
p2 = Point(5, 10)

p1 == p2  # Should return True
p1 is p2  # Should return False
hash(p1) == hash(p2)  # Should return False, so that they can be stored as different entries in a dict

I could use unsafe_hash=True, e.g.

@dataclass(unsafe_hash=True)
class Point:
    x: int
    y: int

But this will cause problems when the points are stored in a dictionary. E.g.

p1 = Point(5, 10)
p2 = Point(5, 10)

d = {p1: True, p2: False}
d  # Returns {Point(x=5, y=10): False}

Is there any reason why I shouldn't instead implement the __hash__ method to return a hash based on the id of the object? Similar to the implementation of object.__hash__ (ref).

@dataclass
class Point:
    x: int
    y: int

    def __hash__(self):
        return hash(id(self))

This seems like the simplest solution, and gives the class the desired behaviour, but it seems to go against the advice in the Python docs:

The only required property is that objects which compare equal have the same hash value

Upvotes: 1

Views: 501

Answers (1)

Silvio Mayolo
Silvio Mayolo

Reputation: 70267

You're right that that __hash__ goes against the recommendations in the docs. In fact, it's more than a recommendation; Python's built-in data structures are allowed to rely on that assumption. If you have two equal objects which hash to different values, then indexing into a dictionary whose keys are those points could return either point, depending entirely on how the dictionary is stored internally. Now, the current Python implementation may or may not actually do this, but other Python implementations could, and a future version of Python might break your code. Bottom line: It's a bad idea.

Let me propose a different idea. You want two different things, so consider making two different classes. One class can represent the concrete notion of identity (both in __hash__ and in __eq__, for consistency). Then have a separate class for the point-to-point equality.

class PointObject:
  point: Point

  def __init__(self, point: Point) -> None:
    self.point = point

@dataclass(frozen=True)
class Point:
  x: int
  y: int

Now your hash can have PointObject as keys, but you can do all of your business logic on the contained Point objects. Obviously, you might pick a better name than "PointObject", depending on what they actually represent in your domain model.

Upvotes: 1

Related Questions