Davide R.
Davide R.

Reputation: 900

Garbage collect a class with a reference to its instance?

Consider this code snippet:

import gc
from weakref import ref


def leak_class(create_ref):
    class Foo(object):
        # make cycle non-garbage collectable
        def __del__(self):
            pass

    if create_ref:
        # create a strong reference cycle
        Foo.bar = Foo()
    return ref(Foo)


# without reference cycle
r = leak_class(False)
gc.collect()
print r() # prints None

# with reference cycle
r = leak_class(True)
gc.collect()
print r() # prints <class '__main__.Foo'>

It creates a reference cycle that cannot be collected, because the referenced instance has a __del__ method. The cycle is created here:

# create a strong reference cycle
Foo.bar = Foo()

This is just a proof of concept, the reference could be added by some external code, a descriptor or anything. If that's not clear to you, remember that each objects mantains a reference to its class:

  +-------------+             +--------------------+
  |             |  Foo.bar    |                    |
  | Foo (class) +------------>| foo (Foo instance) |
  |             |             |                    |
  +-------------+             +----------+---------+
        ^                                |
        |         foo.__class__          |
        +--------------------------------+

If I could guarantee that Foo.bar is only accessed from Foo, the cycle wouldn't be necessary, as theoretically the instance could hold only a weak reference to its class.

Can you think of a practical way to make this work without a leak?


As some are asking why would external code modify a class but can't control its lifecycle, consider this example, similar to the real-life example I was working to:

class Descriptor(object):
    def __get__(self, obj, kls=None):
        if obj is None:
            try:
                obj = kls._my_instance
            except AttributeError:
                obj = kls()
                kls._my_instance = obj
        return obj.something()


# usage example #
class Example(object):
    foo = Descriptor()

    def something(self):
        return 100


print Example.foo

In this code only Descriptor (a non-data descriptor) is part of the API I'm implementing. Example class is an example of how the descriptor would be used.

Why does the descriptor store a reference to an instance inside the class itself? Basically for caching purposes. Descriptor required this contract with the implementor: it would be used in any class assuming that

  1. The class has a constructor with no args, that gives an "anonymous instance" (my definition)
  2. The class has some behavior-specific methods (something here).
  3. An instance of the class can stay alive for an undefined amount of time.

It doesn't assume anything about:

  1. How long it takes to construct an object
  2. Whether the class implements del or other magic methods
  3. How long the class is expected to live

Moreover the API was designed to avoid any extra load on the class implementor. I could have moved the responsibility for caching the object to the implementor, but I wanted a standard behavior.

There actually is a simple solution to this problem: make the default behavior to cache the instance (like it does in this code) but allow the implementor to override it if they have to implement __del__.

Of course this wouldn't be as simple if we assumed that the class state had to be preserved between calls.


As a starting point, I was coding a "weak object", an implementation of object that only kept a weak reference to its class:

from weakref import proxy

def make_proxy(strong_kls):
    kls = proxy(strong_kls)
    class WeakObject(object):
        def __getattribute__(self, name):
            try:
                attr = kls.__dict__[name]
            except KeyError:
                raise AttributeError(name)

            try:
                return attr.__get__(self, kls)
            except AttributeError:
                return attr
        def __setattr__(self, name, value):
            # TODO: implement...
            pass
    return WeakObject

Foo.bar = make_proxy(Foo)()

It appears to work for a limited number of use cases, but I'd have to reimplement the whole set of object methods, and I don't know how to deal with classes that override __new__.

Upvotes: 1

Views: 295

Answers (2)

BrenBarn
BrenBarn

Reputation: 251363

For your example, why don't you store _my_instance in a dict on the descriptor class, rather than on the class holding the descriptor? You could use a weakref or WeakValueDictionary in that dict, so that when the object disappears the dict will just lose its reference and the descriptor will create a new one on the next access.

Edit: I think you have a misunderstanding about the possibility of collecting the class while the instance lives on. Methods in Python are stored on the class, not the instance (barring peculiar tricks). If you have an object obj of class Class, and you allowed Class to be garbage collected while obj still exists, then calling a method obj.meth() on the object would fail, because the method would have disappeared along with the class. That is why your only option is to weaken your class->obj reference; even if you could make objects weakly reference their class, all it would do is break the class if the weakness ever "took effect" (i.e., if the class were collected while an instance still existed).

Upvotes: 2

shx2
shx2

Reputation: 64308

The problem you're facing is just a special case of the general ref-cycle-with-__del__ problem.

I don't see anything unusual in the way the cycles are created in your case, which is to say, you should resort to the standard ways of avoiding the general problem.

I think implementing and using a weak object would be hard to get right, and you would still need to remember to use it in all places where you define __del__. It doesn't sound like the best approach.

Instead, you should try the following:

  1. consider not defining __del__ in your class (recommended)
  2. in classes which define __del__, avoid reference cycles (in general, it might be hard/impossible to make sure no cycles are created anywhere in your code. In your case, seems like you want the cycles to exist)
  3. explicitly break the cycles, using del (if there are appropriate points to do that in your code)
  4. scan the gc.garbage list, and explicitly break reference cycles (using del)

Upvotes: 1

Related Questions