Roman
Roman

Reputation: 2365

Selective comparison of class objects

I need to make multiple comparisons of class objects. However, only values of selected fields are subject to comparison, i.e.:

class Class:
    def __init__(self, value1, value2, value3, dummy_value):
        self.field1 = value1
        self.field2 = value2
        self.field3 = value3
        self.irrelevant_field = dummy_value

obj1 = Class(1, 2, 3, 'a')
obj2 = Class(1, 2, 3, 'b') #compare(obj1, obj2) = True
obj3 = Class(1, 2, 4, 'a') #compare(obj1, obj3) = False

Currently I do it this way:

def dumm_compare(obj1, obj2):
    if obj1.field1 != obj2.field1:
        return False
    if obj1.field2 != obj2.field2:
        return False
    if obj1.field3 != obj2.field3:
        return False
    return True

As my actual number of relevant fields is greater than 10, this approach leads to quite bulky code. That's why I tried something like this:

def cute_compare(obj1, obj2):
    for field in filter(lambda x: x.startswith('field'), dir(obj1)):
        if getattr(obj1, field) != getattr(obj2, field):
            return False
    return True

The code is compact; however, the performance suffers significantly:

import time

starttime = time.time()
for i in range(100000):
    dumm_compare(obj1, obj2)
print('Dumm compare runtime: {:.3f} s'.format(time.time() - starttime))

starttime = time.time()
for i in range(100000):
    cute_compare(obj1, obj2)
print('Cute compare runtime: {:.3f} s'.format(time.time() - start time))

#Dumm compare runtime: 0.046 s
#Cute compare runtime: 1.603 s

Is there a way to implement selective object comparison more efficiently?

EDIT: In fact I need several such functions (which compare objects by different, sometimes overlapping, sets of fields). That's why I do not want to overwrite built-in class methods.

Upvotes: 2

Views: 89

Answers (2)

JL Peyret
JL Peyret

Reputation: 12154

If the fields exist for all instances in one particular comparison set, try saving the list to compare to the class.

def prepped_compare(obj1, obj2):
    li_field = getattr(obj1, "li_field", None)
    if li_field  is None:
        #grab the list from the compare object, but this assumes a 
        #fixed fieldlist per run.
        #mind you getattr(obj,non-existentfield) blows up anyway
        #so y'all making that assumption already
        li_field = [f for f in vars(obj1) if f.startswith('field')]
        obj1.__class__.li_field = li_field

    for field in li_field:
        if getattr(obj1, field) != getattr(obj2, field):
            return False
    return True    

or pre-compute outside, better

def prepped_compare2(obj1, obj2, li_field):

    for field in li_field:
        if getattr(obj1, field) != getattr(obj2, field):
            return False
    return True    


starttime = time.time()
li_field = [f for f in vars(obj1) if f.startswith('field')]
for i in range(100000):
    prepped_compare2(obj1, obj2, li_field)
print('prepped2 compare runtime: {:.3f} s'.format(time.time() - starttime))

output:

Dumm compare runtime: 0.051 s
Cute compare runtime: 0.762 s
prepped compare runtime: 0.122 s
prepped2 compare runtime: 0.093 s

re. overriding eq, I am pretty sure you could have something like.

def mycomp01(self, obj2) #possibly with a saved field list01 on the class
def mycomp02(self, obj2) #possibly with a saved field list02 on the class

#let's do comp01.
Class.__eq__ = mycomp01
run comp01 tests
Class.__eq__ = mycomp02
run comp02 tests

Upvotes: 1

Martijn Pieters
Martijn Pieters

Reputation: 1121814

dir() not only includes instance attributes, but it'll traverse the class hierarchy as well. As such it does much more work than is needed here; dir() is really only suitable for debugging tasks.

Stick to using vars() instead, perhaps combined with any():

def faster_compare(obj1, obj2):
    obj2_vars = vars(obj2)
    return all(value == obj2_vars[field]
               for field, value in vars(obj1).items() if field.startswith('field'))

vars() returns a dictionary containing the attributes of the instance only; in the above generator expression I access both the attribute name and its value in one step by using the dict.items() method.

I replaced the getattr() method call for obj2 to use the same dictionary approach, this saves a framestack push and pop each time as the key lookup can be handled in bytecode (C code) entirely. Note that this does assume you are not using properties; only actual instance attributes are going to be listed.

This method still has to do more work than hardcoding the if branches, but it is at least not performing all that bad:

>>> from timeit import timeit
>>> timeit('compare(obj1, obj2)', 'from __main__ import obj1, obj2, dumm_compare as compare')
0.349234500026796
>>> timeit('compare(obj1, obj2)', 'from __main__ import obj1, obj2, cute_compare as compare')
16.48695448896615
>>> timeit('compare(obj1, obj2)', 'from __main__ import obj1, obj2, faster_compare as compare')
1.9555692840367556

Upvotes: 1

Related Questions