Reputation: 2365
I need to make multiple comparisons of class objects. However, only values of selected fields are subject to comparison, i.e.:
class Class:
def __init__(self, value1, value2, value3, dummy_value):
self.field1 = value1
self.field2 = value2
self.field3 = value3
self.irrelevant_field = dummy_value
obj1 = Class(1, 2, 3, 'a')
obj2 = Class(1, 2, 3, 'b') #compare(obj1, obj2) = True
obj3 = Class(1, 2, 4, 'a') #compare(obj1, obj3) = False
Currently I do it this way:
def dumm_compare(obj1, obj2):
if obj1.field1 != obj2.field1:
return False
if obj1.field2 != obj2.field2:
return False
if obj1.field3 != obj2.field3:
return False
return True
As my actual number of relevant fields is greater than 10, this approach leads to quite bulky code. That's why I tried something like this:
def cute_compare(obj1, obj2):
for field in filter(lambda x: x.startswith('field'), dir(obj1)):
if getattr(obj1, field) != getattr(obj2, field):
return False
return True
The code is compact; however, the performance suffers significantly:
import time
starttime = time.time()
for i in range(100000):
dumm_compare(obj1, obj2)
print('Dumm compare runtime: {:.3f} s'.format(time.time() - starttime))
starttime = time.time()
for i in range(100000):
cute_compare(obj1, obj2)
print('Cute compare runtime: {:.3f} s'.format(time.time() - start time))
#Dumm compare runtime: 0.046 s
#Cute compare runtime: 1.603 s
Is there a way to implement selective object comparison more efficiently?
EDIT: In fact I need several such functions (which compare objects by different, sometimes overlapping, sets of fields). That's why I do not want to overwrite built-in class methods.
Upvotes: 2
Views: 89
Reputation: 12154
If the fields exist for all instances in one particular comparison set, try saving the list to compare to the class.
def prepped_compare(obj1, obj2):
li_field = getattr(obj1, "li_field", None)
if li_field is None:
#grab the list from the compare object, but this assumes a
#fixed fieldlist per run.
#mind you getattr(obj,non-existentfield) blows up anyway
#so y'all making that assumption already
li_field = [f for f in vars(obj1) if f.startswith('field')]
obj1.__class__.li_field = li_field
for field in li_field:
if getattr(obj1, field) != getattr(obj2, field):
return False
return True
or pre-compute outside, better
def prepped_compare2(obj1, obj2, li_field):
for field in li_field:
if getattr(obj1, field) != getattr(obj2, field):
return False
return True
starttime = time.time()
li_field = [f for f in vars(obj1) if f.startswith('field')]
for i in range(100000):
prepped_compare2(obj1, obj2, li_field)
print('prepped2 compare runtime: {:.3f} s'.format(time.time() - starttime))
output:
Dumm compare runtime: 0.051 s
Cute compare runtime: 0.762 s
prepped compare runtime: 0.122 s
prepped2 compare runtime: 0.093 s
re. overriding eq, I am pretty sure you could have something like.
def mycomp01(self, obj2) #possibly with a saved field list01 on the class
def mycomp02(self, obj2) #possibly with a saved field list02 on the class
#let's do comp01.
Class.__eq__ = mycomp01
run comp01 tests
Class.__eq__ = mycomp02
run comp02 tests
Upvotes: 1
Reputation: 1121814
dir()
not only includes instance attributes, but it'll traverse the class hierarchy as well. As such it does much more work than is needed here; dir()
is really only suitable for debugging tasks.
Stick to using vars()
instead, perhaps combined with any()
:
def faster_compare(obj1, obj2):
obj2_vars = vars(obj2)
return all(value == obj2_vars[field]
for field, value in vars(obj1).items() if field.startswith('field'))
vars()
returns a dictionary containing the attributes of the instance only; in the above generator expression I access both the attribute name and its value in one step by using the dict.items()
method.
I replaced the getattr()
method call for obj2
to use the same dictionary approach, this saves a framestack push and pop each time as the key lookup can be handled in bytecode (C code) entirely. Note that this does assume you are not using properties; only actual instance attributes are going to be listed.
This method still has to do more work than hardcoding the if
branches, but it is at least not performing all that bad:
>>> from timeit import timeit
>>> timeit('compare(obj1, obj2)', 'from __main__ import obj1, obj2, dumm_compare as compare')
0.349234500026796
>>> timeit('compare(obj1, obj2)', 'from __main__ import obj1, obj2, cute_compare as compare')
16.48695448896615
>>> timeit('compare(obj1, obj2)', 'from __main__ import obj1, obj2, faster_compare as compare')
1.9555692840367556
Upvotes: 1