How to make a class which disallows duplicate instances (returning an existing instance where possible)?

Question

I have data, and each entry needs to be an instance of a class. I'm expecting to encounter many duplicate entries in my data. I essentially want to end up with a set of all the unique entries (ie discard any duplicates). However, instantiating the whole lot and putting them into a set after the fact is not optimal because...

I have many entries,
the proportion of duplicated entries is expected to be rather high,
my __init__() method is doing quite a lot of costly computation for each unique entry, so I want to avoid redoing these computations unnecessarily.

I recognize that this is basically the same question asked here but...

the accepted answer doesn't actually solve the problem. If you make __new__() return an existing instance, it doesn't technically make a new instance, but it still calls __init__() which then redoes all the work you've already done, which makes overriding __new__() completely pointless. (This is easily demonstrated by inserting print statements inside __new__() and __init__() so you can see when they run.)
the other answer requires calling a class method instead of calling the class itself when you want a new instance (eg: x = MyClass.make_new() instead of x = MyClass()). This works, but it isn't ideal IMHO since it is not the normal way one would think to make a new instance.

Can __new__() be overridden so that it will return an existing entity without running __init__() on it again? If this isn't possible, is there maybe another way to go about this?

Mad Physicist · Accepted Answer

Assuming you have a way of identifying duplicate instances, and a mapping of such instances, you have a few viable options:

Use a classmethod to get your instances for you. The classmethod would serve a similar purpose to __call__ in your metaclass (currently type). The main difference is that it would check if an instance with the requested key already exists before calling __new__:
```
class QuasiSingleton:
    @classmethod
    def make_key(cls, *args, **kwargs):
        # Creates a hashable instance key from initialization parameters

    @classmethod
    def get_instance(cls, *args, **kwargs):
        key = cls.make_key(*args, **kwargs)
        if not hasattr(cls, 'instances'):
            cls.instances = {}
        if key in cls.instances:
            return cls.instances[key]
        # Only call __init__ as a last resort
        inst = cls(*args, **kwargs)
        cls.instances[key] = inst
        return inst
```
I would recommend using this base class, especially if your class is mutable in any way. You do not want modifications of one instance show up in another without making it clear that the instances are potentially the same. Doing cls(*args, **kwargs) implies that you are getting a different instance every time, or at least that your instances are immutable and you don't care.

Redefine __call__ in your metaclass:

class QuasiSingletonMeta(type):
    def make_key(cls, *args, **kwargs):
        ...

    def __call__(cls, *args, **kwargs):
        key = cls.make_key(*args, **kwargs)
        if not hasattr(cls, 'instances'):
            cls.instances = {}
        if key in cls.instances:
            return cls.instances[key]
        inst = super().__call__(*args, **kwargs)
        cls.instances[key] = inst
        return inst

Here, super().__call__ is equivalent to calling __new__ and __init__ for cls.

In both cases, the basic caching code is the same. The main difference is how to get a new instance from the user's perspective. Using a classmethod like get_instance intuitively informs the user that they are getting a duplicate instance. Using a normal call to the class object implies that the instance will always be new, so should only be done for immutable classes.

Notice that in neither case shown above is there much of a point to calling __new__ without __init__.

A third, hybrid option is possible though. With this option, you would be creating a new instance, but copying over the expensive part of __init__'s computation from an existing instance, instead of doing it all over again. This version would not cause any problems if implemented through the metaclass, since all the instances would in fact be independent:

class QuasiSingleton:
    @classmethod
    def make_key(cls, *args, **kwargs):
        ...

    def __new__(cls, *args, **kwargs):
        if 'cache' not in cls.__dict__:
            cls.cache = {}
        return super().__new__(cls, *args, **kwargs)

    def __init__(self, *args, **kwargs):
        key = self.make_key(*args, **kwargs)
        if key in self.cache:  # Or more accurately type(self).instances
            data = self.cache[key]
        else:
            data = # Do lengthy computation
        # Initialize self with data object

With this option, remember to call super().__init__ and (super().__new__ if you need it).

How to make a class which disallows duplicate instances (returning an existing instance where possible)?

Answers (1)

Related Questions