varantir
varantir

Reputation: 6854

Same hash for same (python) dictionary after restarting interpreter?

As it was elaborated in this post, the command

str(hash(frozenset(kwargs.items())))

yields different results for the same dictionary if you restart the interpreter. Now the obvious question: Since I need the hash for caching, I need a hashing which is the same when the dictionary is the same (Otherwise caching would not make any sense). So to say, how can I get an injective hash for every (not nested) dictionary?

Upvotes: 1

Views: 636

Answers (2)

tschmelz
tschmelz

Reputation: 550

This is not only the case for dicts, but also for any kind of string:

❯ python --version
Python 3.10.1
❯ python -c "print(hash('hello'))"
805068502777750074
❯ python -c "print(hash('hello'))"
-8272315863596519132

What you could do instead is using another hashing method like md5. If you are using an object oriented approach, you could overwrite the __hash__ method like follows:

# persistent_hash.py
import hashlib
import operator as op


class MyClass:
    def __init__(self, name: str, content: dict):
        self.name = name
        self.content = content

    def __hash__(self):
        to_be_hashed = "".join(
            str(value) for _, value in sorted(self.__dict__.items(),
                                              key=op.itemgetter(0))
        )
        return int.from_bytes(
            hashlib.md5(to_be_hashed.encode("utf-8")).digest(),
            "big"
        )


if __name__ == "__main__":
    my_class = MyClass(name="awesome", content={"best_number": 42})
    print(hash(my_class))

Sorting __dict__ by key ensures the same hash for all attributes of MyClass, even if new members are inserted to the class in different order. This returns consistent hash values:

❯ python persistent_hash.py
1439132221247659084
❯ python persistent_hash.py
1439132221247659084

==========

Fun fact: python 2.X seems to be consistent in hashing strings:

❯ python2 --version
Python 2.7.18
❯ python2 -c "print(hash('hello'))"
840651671246116861
❯ python2 -c "print(hash('hello'))"
840651671246116861

Upvotes: 1

deets
deets

Reputation: 6395

A simple

":".join(key for key in sorted(kwargs.items()))

should suffice. Of course this assumes keys are strings and sorting them is stable, if not, you need to invoke str around them (and make sure it's meaningful) and/or provide proper comparison.

Upvotes: 0

Related Questions