alex
alex

Reputation: 11400

Inconsistent hash output on tuples

Comparing hashes for some two-element tuples

for i in range(11):
    print(i, hash((i,i)) == hash((-i,-i)))

I expected to get True for when i==0 and False for the rest. I was surprised to see this:

0 True
1 False
2 True
3 True
4 True
5 True
6 True
7 True
8 False
9 True
10 True

Why is this happening?

AFAIK it is not the same issue as in this question, since it's not about order but the values themselves.

Upvotes: 4

Views: 607

Answers (2)

alex
alex

Reputation: 11400

For future reference, I ended up using this code for consistent tuples hashing:

import hashlib
import pickle

def hash2(data):
    bytez = pickle.dumps(data)
    hashed = hashlib.md5(bytez)
    return int(hashed.hexdigest(), 16)

Upvotes: 0

blhsing
blhsing

Reputation: 106618

Hash values are never guaranteed to be collision-free even though good hashing algorithms strive to make them so. Python 3.8 has improved on the hashing algorithm particularly for tuples so the issue in the question has now become much harder to reproduce in Python 3.8 compared to Python 3.7.

Excerpt from the changelog for Python 3.8.0 alpha 1:

bpo-34751: The hash function for tuples is now based on xxHash which gives better collision results on (formerly) pathological cases. Additionally, on 64-bit systems it improves tuple hashes in general. Patch by Jeroen Demeyer with substantial contributions by Tim Peters.

Upvotes: 4

Related Questions