Reputation: 542
I have data, and each entry needs to be an instance of a class. I'm expecting to encounter many duplicate entries in my data. I essentially want to end up with a set of all the unique entries (ie discard any duplicates). However, instantiating the whole lot and putting them into a set after the fact is not optimal because...
__init__()
method is doing quite a lot of costly computation for each unique entry, so I want to avoid redoing these computations unnecessarily.I recognize that this is basically the same question asked here but...
the accepted answer doesn't actually solve the problem. If you make __new__()
return an existing instance, it doesn't technically make a new instance, but it still calls __init__()
which then redoes all the work you've already done, which makes overriding __new__()
completely pointless. (This is easily demonstrated by inserting print
statements inside __new__()
and __init__()
so you can see when they run.)
the other answer requires calling a class method instead of calling the class itself when you want a new instance (eg: x = MyClass.make_new()
instead of x = MyClass()
). This works, but it isn't ideal IMHO since it is not the normal way one would think to make a new instance.
Can __new__()
be overridden so that it will return an existing entity without running __init__()
on it again? If this isn't possible, is there maybe another way to go about this?
Upvotes: 2
Views: 1137
Reputation: 114440
Assuming you have a way of identifying duplicate instances, and a mapping of such instances, you have a few viable options:
Use a classmethod
to get your instances for you. The classmethod would serve a similar purpose to __call__
in your metaclass (currently type
). The main difference is that it would check if an instance with the requested key already exists before calling __new__
:
class QuasiSingleton:
@classmethod
def make_key(cls, *args, **kwargs):
# Creates a hashable instance key from initialization parameters
@classmethod
def get_instance(cls, *args, **kwargs):
key = cls.make_key(*args, **kwargs)
if not hasattr(cls, 'instances'):
cls.instances = {}
if key in cls.instances:
return cls.instances[key]
# Only call __init__ as a last resort
inst = cls(*args, **kwargs)
cls.instances[key] = inst
return inst
I would recommend using this base class, especially if your class is mutable in any way. You do not want modifications of one instance show up in another without making it clear that the instances are potentially the same. Doing cls(*args, **kwargs)
implies that you are getting a different instance every time, or at least that your instances are immutable and you don't care.
Redefine __call__
in your metaclass:
class QuasiSingletonMeta(type):
def make_key(cls, *args, **kwargs):
...
def __call__(cls, *args, **kwargs):
key = cls.make_key(*args, **kwargs)
if not hasattr(cls, 'instances'):
cls.instances = {}
if key in cls.instances:
return cls.instances[key]
inst = super().__call__(*args, **kwargs)
cls.instances[key] = inst
return inst
Here, super().__call__
is equivalent to calling __new__
and __init__
for cls
.
In both cases, the basic caching code is the same. The main difference is how to get a new instance from the user's perspective. Using a classmethod
like get_instance
intuitively informs the user that they are getting a duplicate instance. Using a normal call to the class object implies that the instance will always be new, so should only be done for immutable classes.
Notice that in neither case shown above is there much of a point to calling __new__
without __init__
.
A third, hybrid option is possible though. With this option, you would be creating a new instance, but copying over the expensive part of __init__
's computation from an existing instance, instead of doing it all over again. This version would not cause any problems if implemented through the metaclass, since all the instances would in fact be independent:
class QuasiSingleton:
@classmethod
def make_key(cls, *args, **kwargs):
...
def __new__(cls, *args, **kwargs):
if 'cache' not in cls.__dict__:
cls.cache = {}
return super().__new__(cls, *args, **kwargs)
def __init__(self, *args, **kwargs):
key = self.make_key(*args, **kwargs)
if key in self.cache: # Or more accurately type(self).instances
data = self.cache[key]
else:
data = # Do lengthy computation
# Initialize self with data object
With this option, remember to call super().__init__
and (super().__new__
if you need it).
Upvotes: 3