Reputation: 986
I want to implement the following functionality:
TestClass
values
accepts arbitrary number of NewClass
objectsNewClass
objects which do not have all the same attribute values get added to
TestClass.values
I've come up with this:
class NewClass:
def __init__(self, value1, value2):
self.value1 = value1
self.value2 = value2
class TestClass:
def __init__(self, *values):
self.values = self._set(values)
def _set(self, object_list):
unique_dict = {}
for obj in object_list:
if list(obj.__dict__.values()) not in unique_dict.values():
unique_dict[obj] = list(obj.__dict__.values())
return list(unique_dict.keys())
obj1 = NewClass(1, 2)
obj2 = NewClass(1, 2)
obj3 = NewClass(5, 2)
test = TestClass(obj1, obj2, obj3)
Only obj1
and obj3
are in the test.values
I am wondering how to do it in "protocol" way, such as len
or add
, etc.
def __len__(self):
return len(self.values)
And does the second approach have meaningful benefits compared to the first one?
Upvotes: 2
Views: 349
Reputation: 23624
Just to add to both of these answers... Using a frozen dataclass can avoid a lot of the boilerplate. Not only does it generate __hash__
, __eq__
, and __repr__
for you, but it also enforces immutability for the lifetime of the object.
Writing __hash__
and __eq__
are not conceptually hard to do, but it is notoriously easy to get them wrong. Updates to class definitions such as adding or removing attributes, changing attribute data-types, etc, can leave room for differences between the class attributes and the hashing methods.
This issue to me is the biggest motivation for using dataclasses. You create concise, simple immutable types which you can hash on easily. You leave the tedious work of listing or comparing attributes to the dataclass wrapper and only have to work with the more human readable format of the class.
from dataclasses import dataclass
@dataclass(frozen=True)
class NewClass:
value1: int
value2: int
obj1 = NewClass(1, 2)
obj2 = NewClass(1, 2)
obj3 = NewClass(5, 2)
test = {obj1, obj2, obj3}
print(test)
{NewClass(value1=1, value2=2), NewClass(value1=5, value2=2)}
Upvotes: 3
Reputation: 295315
Assuming your value1
and value2
are immutable (integers, strings and tuples are fine; lists and dicts are not), you can hash them -- implementing both __hash__
and __eq__
will allow the built-in set type to identify duplicates.
class NewClass:
def __init__(self, value1, value2):
self.value1 = value1
self.value2 = value2
def __hash__(self):
return hash((self.value1, self.value2))
def __eq__(self, other):
return self.value1 == other.value1 and self.value2 == other.value2
def __repr__(self):
return 'NewClass(%r, %r)' % (self.value1, self.value2)
print(set([NewClass(1,2), NewClass(1,2), NewClass(3,4)]))
...properly returns:
{NewClass(1, 2), NewClass(3, 4)}
Upvotes: 3
Reputation: 92440
If you define __hash__
and __eq__
on NewClass
, you can pass instances to set()
and it will use this functions to determine if the objects are equal in terms of the set. You need to be careful for mutable instances, since the properties can change after the fact.
Here's a simple example:
class NewClass:
def __init__(self, value1, value2):
self.value1 = value1
self.value2 = value2
def __hash__(self):
# take the hash of the tuple
return hash((self.value1, self.value2))
def __eq__(self,other):
# are the tuples equal?
return (self.value1, self.value2) == (other.value1, other.value2)
def __repr__(self):
return f'NewClass({self.value1}, {self.value2})'
class TestClass:
def __init__(self, *values):
self.values = list(set(values))
obj1 = NewClass(1, 2)
obj2 = NewClass(1, 2)
obj3 = NewClass(5, 2)
test = TestClass(obj1, obj2, obj3)
test.values
# Only the different instances:
# [NewClass(1, 2), NewClass(5, 2)]
Upvotes: 2