Brian
Brian

Reputation: 923

Python: Testing equivalence of sets of custom classes when all instances are unique by definition?

Using Python 2.6, with the set() builtin, not sets.set.

I have defined some custom data abstraction classes, which will be made members of some sets using the builtin set() object.

The classes are already being stored in a separate structure, before being divided up into sets. All instances of the classes are declared first. No class instances are created or deleted after the first set is declared. No two class instances are ever considered to be "equal" to each other. (Two instances of the class, containing identical data, are considered not the same. A == B is False for all A,B where B is not A.)

Given the above, will there be any reasonable difference between these strategies for testing set_a == set_b?:

Option 1: Store integers in the sets that uniquely identify instances of my class.

Option 2: Store instances of my class, and implement __hash__() and __eq__() to compare id(self) == id(other). (This may not be necessary? Do default implementations of these functions in object just do the same thing but faster?) Possibly use an instance variable that increments every time a new instance calls __init__(). (Not thread safe?)

or,

Option 3: The instances are already stored and looked up in dictionaries keyed by rather long strings. The strings are what most directly represents what the instances are, and are kept unique. I thought storing these strings in the sets would be a RAM overhead and/or create a bunch of extra runtime by calling __eq__() and __hash__(). If this is not the case, I should store the strings directly. (But I think what I've read so far tells me it is the case.)

I'm somewhat new to sets in Python. I've figured out some of what I need to know already, just want to make sure I'm not overlooking something tricky or drawing a false conclusion somewhere.

Upvotes: 1

Views: 371

Answers (1)

georg
georg

Reputation: 214989

I might be misunderstanding the question, but this is how Python behaves by default:

class Foo(object):
    pass

a = Foo()
b = Foo()
c = Foo()

x = set([a, b])
y = set([a, b])
z = set([a, c])

print x == y # True
print x == z # False

Do default implementations of these functions in object just do the same thing but faster?

Yes. User-defined classes have __cmp__() and __hash__() methods by default; with them, all objects compare unequal (except with themselves) and x.__hash__() returns id(x). docs

Upvotes: 1

Related Questions