kaktus_car
kaktus_car

Reputation: 986

How to override "set" builtin?

I want to implement the following functionality:

  1. TestClass values accepts arbitrary number of NewClass objects
  2. Only NewClass objects which do not have all the same attribute values get added to TestClass.values

I've come up with this:

class NewClass:

    def __init__(self, value1, value2):
        self.value1 = value1
        self.value2 = value2


class TestClass:

    def __init__(self, *values):
        self.values = self._set(values)

    def _set(self, object_list):
        unique_dict = {}
        for obj in object_list:
            if list(obj.__dict__.values()) not in unique_dict.values():
                unique_dict[obj] = list(obj.__dict__.values())
        return list(unique_dict.keys())


obj1 = NewClass(1, 2)
obj2 = NewClass(1, 2)
obj3 = NewClass(5, 2)

test = TestClass(obj1, obj2, obj3)

Only obj1 and obj3 are in the test.values

I am wondering how to do it in "protocol" way, such as len or add, etc.

def __len__(self):
    return len(self.values)

And does the second approach have meaningful benefits compared to the first one?

Upvotes: 2

Views: 349

Answers (3)

flakes
flakes

Reputation: 23624

Just to add to both of these answers... Using a frozen dataclass can avoid a lot of the boilerplate. Not only does it generate __hash__, __eq__, and __repr__ for you, but it also enforces immutability for the lifetime of the object.

Writing __hash__ and __eq__ are not conceptually hard to do, but it is notoriously easy to get them wrong. Updates to class definitions such as adding or removing attributes, changing attribute data-types, etc, can leave room for differences between the class attributes and the hashing methods.

This issue to me is the biggest motivation for using dataclasses. You create concise, simple immutable types which you can hash on easily. You leave the tedious work of listing or comparing attributes to the dataclass wrapper and only have to work with the more human readable format of the class.

from dataclasses import dataclass

@dataclass(frozen=True)
class NewClass:
    value1: int
    value2: int

obj1 = NewClass(1, 2)
obj2 = NewClass(1, 2)
obj3 = NewClass(5, 2)

test = {obj1, obj2, obj3}
print(test)
{NewClass(value1=1, value2=2), NewClass(value1=5, value2=2)}

Upvotes: 3

Charles Duffy
Charles Duffy

Reputation: 295315

Assuming your value1 and value2 are immutable (integers, strings and tuples are fine; lists and dicts are not), you can hash them -- implementing both __hash__ and __eq__ will allow the built-in set type to identify duplicates.

class NewClass:
    def __init__(self, value1, value2):
        self.value1 = value1
        self.value2 = value2
    def __hash__(self):
        return hash((self.value1, self.value2))
    def __eq__(self, other):
        return self.value1 == other.value1 and self.value2 == other.value2
    def __repr__(self):
        return 'NewClass(%r, %r)' % (self.value1, self.value2)

print(set([NewClass(1,2), NewClass(1,2), NewClass(3,4)]))

...properly returns:

{NewClass(1, 2), NewClass(3, 4)}

Upvotes: 3

Mark
Mark

Reputation: 92440

If you define __hash__ and __eq__ on NewClass, you can pass instances to set() and it will use this functions to determine if the objects are equal in terms of the set. You need to be careful for mutable instances, since the properties can change after the fact.

Here's a simple example:

class NewClass:
    def __init__(self, value1, value2):
        self.value1 = value1
        self.value2 = value2
    def __hash__(self):
        # take the hash of the tuple
        return hash((self.value1, self.value2))
    def __eq__(self,other):
        # are the tuples equal?
        return (self.value1, self.value2) == (other.value1, other.value2)

    def __repr__(self):
        return f'NewClass({self.value1}, {self.value2})'

class TestClass:
    def __init__(self, *values):
        self.values = list(set(values))


obj1 = NewClass(1, 2)
obj2 = NewClass(1, 2)
obj3 = NewClass(5, 2)

test = TestClass(obj1, obj2, obj3)

test.values
# Only the different instances:
# [NewClass(1, 2), NewClass(5, 2)]

Upvotes: 2

Related Questions