pv.
pv.

Reputation: 35145

Self-deadlock due to garbage collector in single-threaded code

I have design problem: there is a global resource that cannot be accessed from multiple threads at once, and so I need a lock around it to serialize the access to it. However, Python's garbage collector can run a __del__ method while I am doing some processing while holding the lock. If the destructor tries to access the resource, this ends up with a deadlock.

As an example, consider the following innocent-looking single-threaded code, which deadlocks if you run it:

import threading

class Handle(object):
    def __init__(self):
        self.handle = do_stuff("get")

    def close(self):
        h = self.handle
        self.handle = None
        if h is not None:
            do_stuff("close %d" % h)

    def __del__(self):
        self.close()

_resource_lock = threading.Lock()

def do_stuff(what):
    _resource_lock.acquire()
    try:
        # GC can be invoked here -> deadlock!
        for j in range(20):
            list()
        return 1234
    finally:
        _resource_lock.release()

for j in range(1000):
    xs = []
    b = Handle()
    xs.append(b)
    xs.append(xs)

The resource can deal with several "handles" being open at the same time, and I'd need to deal with their life cycle. Abstracting this into a Handle class and putting the cleanup in __del__ seemed like a smart move, but the above issue breaks this.

One way to deal with the cleanup is to keep a "pending cleanup" list of handles, and if the lock is held when __del__ is run, insert the handle there, and clean up the list later on.

The question is:

Upvotes: 3

Views: 1061

Answers (2)

Hu Bo
Hu Bo

Reputation: 1

circular references are not the key of this problem. You might have object a and b referring each other to make a circle, and a.resource point to an object c with __del__. After a and b are collected (they do not have __del__, so it is safe for them to be collected), c is collected automatically, and c.__del__ is called. It can happen all over the code, and you cannot control it, so it may create a dead lock.

There are also other implementations of Python (e.g. PyPy) without reference counting. With these interpreters, objects are always collected by GC.

The only safe way to use __del__ is using some atomic operations in it. Locks DO NOT WORK: they either dead lock (threading.Lock), or never work (threading.RLock). Since append to a list is an atomic operation in Python, you can put some flags (or some closures) to a global list, and check the list in other threads to execute the "real destructing".

The new GC mode introduced in Python 3.7 might solve the problem https://www.python.org/dev/peps/pep-0556/

Upvotes: 0

Thomas Orozco
Thomas Orozco

Reputation: 55283

Python's Garbage Collector will not cleanup circular dependencies that have a "custom" __del__ method.

Since you already have a __del__ method, all you need is a circular dependency to "disable" the GC for those objects:

class Handle(object):
    def __init__(self):
        self.handle = do_stuff("get")
        self._self = self

Now, that creates a memory leak, so how do we fix this?

Once you're ready to free the objects, just remove the circular dependency:

import threading
import gc


class Handle(object):
    def __init__(self):
        self.handle = do_stuff("get")
        self._self = self

    def close(self):
        h = self.handle
        self.handle = None
        if h is not None:
            do_stuff("close %d" % h)

    def __del__(self):
        self.close()

_resource_lock = threading.Lock()

def do_stuff(what):
    _resource_lock.acquire()
    try:
        # GC can be invoked here -> deadlock!
        for j in range(20):
            list()
        return 1234
    finally:
        _resource_lock.release()

for j in range(1000):
    xs = []
    b = Handle()
    xs.append(b)
    xs.append(xs)


# Make sure the GC is up to date
gc.collect()
print "Length after work", len(gc.garbage)

# These are kept along due to our circular depency
# If we remove them from garbage, they come back
del gc.garbage[:]
gc.collect()
print "Length now", len(gc.garbage)

# Let's break it
for handle in gc.garbage:
    handle._self = None

# Now, our objects don't come back
del gc.garbage[:]
gc.collect()
print "Length after breaking circular dependencies", len(gc.garbage)

Will do:

Length after work 999
Length now 999
Length after breaking circular dependencies 0

On the other hand, why do you need to access this complex library in cleanup code, whose execution you don't control?

A cleaner solution here might be to do the cleanup in the loop, and break the circular dependency after the cleanup, so that the GC can then do its thing.

Here's an implementation:

import threading
import gc


class Handle(object):
    def __init__(self):
        self.handle = do_stuff("get")
        self._self = self

    def close(self):
        h = self.handle
        self.handle = None
        if h is not None:
            do_stuff("close %d" % h)
        del self._self

    def __del__(self):
        # DO NOT TOUCH THIS
        self._ = None    

_resource_lock = threading.Lock()

def do_stuff(what):
    _resource_lock.acquire()
    try:
        # GC can be invoked here -> deadlock!
        for j in range(20):
            list()
        return 1234
    finally:
        _resource_lock.release()

for j in range(1000):
    xs = []
    b = Handle()
    xs.append(b)
    xs.append(xs)


# Make sure the GC is up to date
gc.collect()
print "Length after work", len(gc.garbage)

# These are kept along due to our circular depency
# If we remove them from garbage, they come back
del gc.garbage[:]
gc.collect()
print "Length now", len(gc.garbage)

# Let's break it
for handle in gc.garbage:
    handle.close()

# Now, our objects don't come back
del gc.garbage[:]
gc.collect()
print "Length after breaking circular dependencies", len(gc.garbage)

And the output shows that our circular dependency does prevent collection:

Length after work 999
Length now 999
Length after breaking circular dependencies 0

Upvotes: 2

Related Questions