Reputation: 1279
I have a helper class to help with string methods. It has a bunch of methods and variables but I want the underlying hash to based on the contents of its 'main' string. So the class looks something similar to this:
class Topic:
def __init__(self, name):
self.name = name
def getName(self):
return self.name
def setName(self, newName):
self.name = newName
def __str__(self):
return self.name
however I want a dictionary to hash this object as a string so when I do the following code:
a = Topic('test')
v = {a : 'oh hey'}
print(v[Topic('test')])
I want it to print 'oh hey' instead of throwing a key error. I tried doing this to my Topic class:
def __hash__(self):
return hash(self.name)
but it didn't work and I can't find online how Python hashes their strings. Is there anyway to make this work the way I intend? Thanks for any information.
Upvotes: 2
Views: 1418
Reputation: 366103
If you read the documentation on __hash__
, it explains what's going on, and how to fix it:
If a class does not define an
__eq__()
method it should not define a__hash__()
operation either…
If two values hash the same, but aren't equal, they're not the same key as far as a dict is concerned, they're two different values that happened to have a hash collision. So, your Topic
values are still keyed by identity (you can only look up a Topic
with the exact same instance, not another instance with the same name), you're just making it less efficient.
To fix that, you want to add an __eq__
method that makes two Topic
s equal if they have the same name
.
def __eq__(self, other):
return self.name == other.name
But there are two problems with this.
First, your Topic
objects will now hash the same as their names—but they won't be equal to them. That probably isn't what you want.
If you want to be able to look up a topic by just using the string as a key, you need to change the __eq__
method to handle that:
def __eq__(self, other):
return self.name == other or self.name == other.name
Or, if you want two Topic
s with the same name to work like the same key, but not the name itself, you need to change __hash__
to something like this:
def __hash__(self):
return hash((type(self), self.name))
So, two Topic
values with the name 'spam'
will both get hashed as (Topic, "spam")
, and will match each other, but won't match the hash of "spam"
itself.
The second problem is more serious.
Your Topic
objects are mutable. In fact, by using getters and setters (which you usually don't want in Python), you're explicitly calling out that you want people to be able to mutate the name
of a Topic
.
But if you do that, the same Topic
no longer has the same hash value, and no longer equals its original value. This will break any dictionary you'd put it in.
>>> v = {a: 'oh hey'}
>>> a.setName('test2')
>>> v
KeyError: <__main__.Topic object at 0x12370b0b8>
This is covered in the same docs:
If a class defines mutable objects and implements an
__eq__()
method, it should not implement__hash__()
, since the implementation of hashable collections requires that a key’s hash value is immutable (if the object’s hash value changes, it will be in the wrong hash bucket).
This is why the only builtin collections that are hashable are the immutable ones.
Occasionally, this is worth subverting. If you have an type that's mutable in general, but you know you're never going to mutate one of them after it's stored or looked up in a dict, you can, basically, lie to Python and tell it your type is immutable and therefore suitable as a dict key by defining a __hash__
and __eq__
that would break if you mutated the object, but isn't going to break because you're never going to do that.
But usually, you want to follow the rule that if you want something to be a key, it should be immutable.
Usually it's sufficient to just make it "immutable by convention". For example, if you make name
"private by convention" by renaming it to _name
, and get rid of the setName
method and have only getName
, your existing class (with the added __hash__
and __eq__
methods) is fine. Sure, someone could break your dicts by changing the private attribute's value out from under you, but you can expect your users to be "consenting adults" and not do that unless they have a good reason.
One last thing, while we're at it: You almost always want to define a __repr__
for a class like this. Notice the error we got above complained about <__main__.Topic object at 0x12370b0b8>
? Likewise, if you just evaluate a
at the interactive prompt, or print(v)
, even without any problems, the Topic
is going to show up like this. That's because __str__
only affects str
, not repr
. The usual pattern is:
def __repr__(self):
return f"{type(self).__name__}({self.name!r})"
Now, you'll see something like Topic("spam")
instead of <__main__.Topic object at 0x12370b0b8>
.
You may want to take a look at @dataclass
, namedtuple
, or a third-party library like attrs
that can automatically write all of these methods—__init__
, __hash__
, __eq__
, __repr__
, and others—for you, and ensure that they all work together properly.
For example, this could replace your entire class definition:
@dataclass(frozen=True)
class Topic:
name: str
Because it's frozen
, it will use a tuple of its attributes—which is just name
—for hashing and comparisons.
Upvotes: 4
Reputation: 1279
In order to make something in Python custom-made hashable we need to not just give it a custom hash function but also make it able to be compared to another version of its same type so the updated code(that works) is as follows:
class Topic:
def __init__(self, name):
self.name = name;
def getName(self):
return self.name
def setName(self, newName):
self.name = newName
def __str__(self):
return self.name;
def __eq__(self, other):
return self.name == other.name
def __hash__(self):
return hash(self.name)
EDIT:
@abarnert pointed out something very wrong with this approach. See the comments below(or his very thorough answer) to understand why you SHOULD NOT do this. It will work but it is deceivingly dangerous and should be avoided.
Upvotes: 3