Pookie
Pookie

Reputation: 1279

Hashing a class identically to a string in python

I have a helper class to help with string methods. It has a bunch of methods and variables but I want the underlying hash to based on the contents of its 'main' string. So the class looks something similar to this:

class Topic:

    def __init__(self, name):
        self.name = name

    def getName(self):
        return self.name

    def setName(self, newName):
        self.name = newName

    def __str__(self):
        return self.name

however I want a dictionary to hash this object as a string so when I do the following code:

a = Topic('test')
v = {a : 'oh hey'}

print(v[Topic('test')])

I want it to print 'oh hey' instead of throwing a key error. I tried doing this to my Topic class:

def __hash__(self):
    return hash(self.name)

but it didn't work and I can't find online how Python hashes their strings. Is there anyway to make this work the way I intend? Thanks for any information.

Upvotes: 2

Views: 1418

Answers (2)

abarnert
abarnert

Reputation: 366103

If you read the documentation on __hash__, it explains what's going on, and how to fix it:

If a class does not define an __eq__() method it should not define a __hash__() operation either…

If two values hash the same, but aren't equal, they're not the same key as far as a dict is concerned, they're two different values that happened to have a hash collision. So, your Topic values are still keyed by identity (you can only look up a Topic with the exact same instance, not another instance with the same name), you're just making it less efficient.

To fix that, you want to add an __eq__ method that makes two Topics equal if they have the same name.

def __eq__(self, other):
    return self.name == other.name

But there are two problems with this.


First, your Topic objects will now hash the same as their names—but they won't be equal to them. That probably isn't what you want.

If you want to be able to look up a topic by just using the string as a key, you need to change the __eq__ method to handle that:

def __eq__(self, other):
    return self.name == other or self.name == other.name

Or, if you want two Topics with the same name to work like the same key, but not the name itself, you need to change __hash__ to something like this:

def __hash__(self):
    return hash((type(self), self.name))

So, two Topic values with the name 'spam' will both get hashed as (Topic, "spam"), and will match each other, but won't match the hash of "spam" itself.


The second problem is more serious.

Your Topic objects are mutable. In fact, by using getters and setters (which you usually don't want in Python), you're explicitly calling out that you want people to be able to mutate the name of a Topic.

But if you do that, the same Topic no longer has the same hash value, and no longer equals its original value. This will break any dictionary you'd put it in.

>>> v = {a: 'oh hey'}
>>> a.setName('test2')
>>> v
KeyError: <__main__.Topic object at 0x12370b0b8>

This is covered in the same docs:

If a class defines mutable objects and implements an __eq__() method, it should not implement __hash__(), since the implementation of hashable collections requires that a key’s hash value is immutable (if the object’s hash value changes, it will be in the wrong hash bucket).

This is why the only builtin collections that are hashable are the immutable ones.

Occasionally, this is worth subverting. If you have an type that's mutable in general, but you know you're never going to mutate one of them after it's stored or looked up in a dict, you can, basically, lie to Python and tell it your type is immutable and therefore suitable as a dict key by defining a __hash__ and __eq__ that would break if you mutated the object, but isn't going to break because you're never going to do that.

But usually, you want to follow the rule that if you want something to be a key, it should be immutable.

Usually it's sufficient to just make it "immutable by convention". For example, if you make name "private by convention" by renaming it to _name, and get rid of the setName method and have only getName, your existing class (with the added __hash__ and __eq__ methods) is fine. Sure, someone could break your dicts by changing the private attribute's value out from under you, but you can expect your users to be "consenting adults" and not do that unless they have a good reason.


One last thing, while we're at it: You almost always want to define a __repr__ for a class like this. Notice the error we got above complained about <__main__.Topic object at 0x12370b0b8>? Likewise, if you just evaluate a at the interactive prompt, or print(v), even without any problems, the Topic is going to show up like this. That's because __str__ only affects str, not repr. The usual pattern is:

def __repr__(self):
    return f"{type(self).__name__}({self.name!r})"

Now, you'll see something like Topic("spam") instead of <__main__.Topic object at 0x12370b0b8>.


You may want to take a look at @dataclass, namedtuple, or a third-party library like attrs that can automatically write all of these methods—__init__, __hash__, __eq__, __repr__, and others—for you, and ensure that they all work together properly.

For example, this could replace your entire class definition:

@dataclass(frozen=True)
class Topic:
    name: str

Because it's frozen, it will use a tuple of its attributes—which is just name—for hashing and comparisons.

Upvotes: 4

Pookie
Pookie

Reputation: 1279

In order to make something in Python custom-made hashable we need to not just give it a custom hash function but also make it able to be compared to another version of its same type so the updated code(that works) is as follows:

class Topic:

    def __init__(self, name):
        self.name = name;

    def getName(self):
        return self.name

    def setName(self, newName):
        self.name = newName

    def __str__(self):
        return self.name;

    def __eq__(self, other):
        return self.name == other.name

    def __hash__(self):
        return hash(self.name)

EDIT:

@abarnert pointed out something very wrong with this approach. See the comments below(or his very thorough answer) to understand why you SHOULD NOT do this. It will work but it is deceivingly dangerous and should be avoided.

Upvotes: 3

Related Questions