Ajayc
Ajayc

Reputation: 843

Different __hash__ and __eq__ methods for use in set

I'm trying to implement a set-based solution for a problem and have been running into some issues.

The problem is: I have 2 sets of Group objects. These sets should be unique on email (so we can check if an object from one set is in the other set).

However, two Group objects are not __eq__() if they only have an email match (for example, one set may contain an updated Group object that has a new description). The goal is to have a set where I can perform set operations (intersection and difference) based only on the email field... then check equality based on other fields (description and name)

    class Group:
        def __init__(self, name, email, description):
            self.name = name
            self.email = email
            self.description = description

        def __hash__(self):
            return hash(self.email)

        def __eq__(self, other):
            return self.email == other.email 
                    and self.description == other.description 
                    and self.name == other.name

        def __ne__(self, other):
            return not self.__eq__(other)

        def __str__(self):
            return "Description: {0} Email: {1} Name: {2}".format(self.description, self.email, self.name)

So i'd expect all assert statements to pass here:

    group_1 = Group('first test group', '[email protected]', 'example description')
    group_2 = Group('second test group', '[email protected]', 'example description')
    group_3 = Group('third group', '[email protected]', 'example description')
    group_5 = Group('updated name', '[email protected]', 'example description')

    group_set = set([group_1, group_2, group_3])
    group_set_2 = set([group_3, group_5])

    self.assertTrue(group_5 in group_set.intersection(group_set_2))
    self.assertEqual(2, len(group_set))
    self.assertTrue(group_5 in group_set)

Upvotes: 1

Views: 2748

Answers (1)

Blckknght
Blckknght

Reputation: 104722

Python's set type uses the equality test implemented by an object's __eq__ method to determine if an object is "the same" as another object in its contents. The __hash__ method only allows it to find other elements to compare against more efficiently. So, your hope of using a __hash__ method based on a different set of attribute than the __eq__ method will not work. Multiple unequal objects with the same __hash__ value can exist in the same set (though the set will be somewhat less efficient due to the hash collisions).

If you want a unique mapping from an email address to a Group, I suggest using a dictionary where the keys are email addresses and the values are Group objects. This will let you ensure the email addresses are unique, while also letting you compare Group objects in whatever way is most appropriate.

To perform a union between two such dictionaries, use the update method on a copy of one dictionary:

union = dict_1.copy()
union.update(dict_2)

For an intersection, use a dictionary comprehension:

intersection = {email: group for email, group in dict_2.iteritems() if email in dict_1}

Both of those operations will prefer the values from dict_2 over the values from dict_1 wherever the same email occurs as a key in both. If you want it to work the other way, just switch the dictionary names around.

Upvotes: 1

Related Questions