Boris Gorelik
Boris Gorelik

Reputation: 31777

Hashing uncertain objects

I need to design object that support some sort of uncertainty (or wild characters, if you wish) it its components. The work is done in Python.

Consider the following class

class C():
    def __init__(self, p1):
        self.p1 = p1

The property p1 can be either "x", "y", "z", but sometimes "x or y", or any other combination.

It is required that if p1 of c1 is 'x' and p1 of c2 is 'x or y', then c1 == c2 will return True. This is easily achieved by supplying a proper __eq__ function. However, these objects need also to be stored in sets, therefore I need to supply a __hash__ function. How would you calculate hash function for this case, such if c1 == c2 then hash(c1) == hash(c2)?

Option 1: hashing the property

Not good Here's why

c1 = C('x')
c2  = C('x or y or z')
c1 == c2 #True
hash(c1) == hash(c2)#False

Upvotes: 1

Views: 103

Answers (4)

GaretJax
GaretJax

Reputation: 7780

The easiest solution would be to make all your objects return the same hash. This degrades the set O(1) performance to O(n) as the contained objects will all be inserted into the same slot. The discrimination will then be done using the __eq__ method.

Regarding your requirements, David has already replied extensively.

Upvotes: 0

David Z
David Z

Reputation: 131600

It is required that if p1 of c1 is 'x' and p1 of c2 is 'x or y', then c1 == c2 will return True.

That's very sketchy (i.e. probably bad) design. Equality is supposed to be transitive, so that if c1 == c2 and c2 == c3, then c1 == c3. Now, your specification requires that C('x') == C('x or y') and C('x or y') == C('y'), which should imply that C('x') == C('y') - but you probably don't want that to be true. (And I see that you figured that out while I was writing this.)

What I would suggest is that you leave __eq__ alone and use a completely different method to perform these "fuzzy" comparisons, perhaps something like is_compatible_with. Or if you are going to reimplement __eq__, at least make it something sensible that obeys the transitive property, such as just comparing the string arguments. That may mean that __eq__ isn't very useful for your particular application, but that's okay; that's why you can create other methods.

Upvotes: 0

Boris Gorelik
Boris Gorelik

Reputation: 31777

I have just realized that my design requirement is faulty. The requirement that C('x') == C('x or y') is True and that C('y') == C('x or y') is True will also require that C('x') == C('y') will also be True. Looks that I need to re-think my design and maybe give up the ability to have object hashes. What do you think?

Upvotes: 0

phihag
phihag

Reputation: 287885

Your equality criterion is not transitive, and therefore invalid:

C('x') == C('x or y') == C('y')

but

C('x') != C('y')

Since you can construct an element that equals all others C('x or y or z or a or ...'), the only hash function that fulfills c1 == c2 ⇒ hash(c1) == hash(c2) is a constant one, i.e.

def __hash__(self):
    return 0

Upvotes: 1

Related Questions