Reputation: 35145
I have design problem: there is a global resource that cannot be accessed from multiple threads at once, and so I need a lock around it to serialize the access to it. However, Python's garbage collector can run a __del__
method while I am doing some processing while holding the lock. If the destructor tries to access the resource, this ends up with a deadlock.
As an example, consider the following innocent-looking single-threaded code, which deadlocks if you run it:
import threading
class Handle(object):
def __init__(self):
self.handle = do_stuff("get")
def close(self):
h = self.handle
self.handle = None
if h is not None:
do_stuff("close %d" % h)
def __del__(self):
self.close()
_resource_lock = threading.Lock()
def do_stuff(what):
_resource_lock.acquire()
try:
# GC can be invoked here -> deadlock!
for j in range(20):
list()
return 1234
finally:
_resource_lock.release()
for j in range(1000):
xs = []
b = Handle()
xs.append(b)
xs.append(xs)
The resource can deal with several "handles" being open at the same time, and I'd need to deal with their life cycle. Abstracting this into a Handle
class and putting the cleanup in __del__
seemed like a smart move, but the above issue breaks this.
One way to deal with the cleanup is to keep a "pending cleanup" list of handles, and if the lock is held when __del__
is run, insert the handle there, and clean up the list later on.
The question is:
Is there a threadsafe version of gc.disable()
/ gc.enable()
that would solve this in a cleaner way?
Other ideas how to deal with this?
Upvotes: 3
Views: 1061
Reputation: 1
circular references are not the key of this problem. You might have object a
and b
referring each other to make a circle, and a.resource
point to an object c
with __del__
. After a
and b
are collected (they do not have __del__
, so it is safe for them to be collected), c
is collected automatically, and c.__del__
is called. It can happen all over the code, and you cannot control it, so it may create a dead lock.
There are also other implementations of Python (e.g. PyPy) without reference counting. With these interpreters, objects are always collected by GC.
The only safe way to use __del__
is using some atomic operations in it. Locks DO NOT WORK: they either dead lock (threading.Lock
), or never work (threading.RLock
). Since append to a list is an atomic operation in Python, you can put some flags (or some closures) to a global list, and check the list in other threads to execute the "real destructing".
The new GC mode introduced in Python 3.7 might solve the problem https://www.python.org/dev/peps/pep-0556/
Upvotes: 0
Reputation: 55283
Python's Garbage Collector will not cleanup circular dependencies that have a "custom" __del__
method.
Since you already have a __del__
method, all you need is a circular dependency to "disable" the GC for those objects:
class Handle(object):
def __init__(self):
self.handle = do_stuff("get")
self._self = self
Now, that creates a memory leak, so how do we fix this?
Once you're ready to free the objects, just remove the circular dependency:
import threading
import gc
class Handle(object):
def __init__(self):
self.handle = do_stuff("get")
self._self = self
def close(self):
h = self.handle
self.handle = None
if h is not None:
do_stuff("close %d" % h)
def __del__(self):
self.close()
_resource_lock = threading.Lock()
def do_stuff(what):
_resource_lock.acquire()
try:
# GC can be invoked here -> deadlock!
for j in range(20):
list()
return 1234
finally:
_resource_lock.release()
for j in range(1000):
xs = []
b = Handle()
xs.append(b)
xs.append(xs)
# Make sure the GC is up to date
gc.collect()
print "Length after work", len(gc.garbage)
# These are kept along due to our circular depency
# If we remove them from garbage, they come back
del gc.garbage[:]
gc.collect()
print "Length now", len(gc.garbage)
# Let's break it
for handle in gc.garbage:
handle._self = None
# Now, our objects don't come back
del gc.garbage[:]
gc.collect()
print "Length after breaking circular dependencies", len(gc.garbage)
Will do:
Length after work 999
Length now 999
Length after breaking circular dependencies 0
On the other hand, why do you need to access this complex library in cleanup code, whose execution you don't control?
A cleaner solution here might be to do the cleanup in the loop, and break the circular dependency after the cleanup, so that the GC can then do its thing.
Here's an implementation:
import threading
import gc
class Handle(object):
def __init__(self):
self.handle = do_stuff("get")
self._self = self
def close(self):
h = self.handle
self.handle = None
if h is not None:
do_stuff("close %d" % h)
del self._self
def __del__(self):
# DO NOT TOUCH THIS
self._ = None
_resource_lock = threading.Lock()
def do_stuff(what):
_resource_lock.acquire()
try:
# GC can be invoked here -> deadlock!
for j in range(20):
list()
return 1234
finally:
_resource_lock.release()
for j in range(1000):
xs = []
b = Handle()
xs.append(b)
xs.append(xs)
# Make sure the GC is up to date
gc.collect()
print "Length after work", len(gc.garbage)
# These are kept along due to our circular depency
# If we remove them from garbage, they come back
del gc.garbage[:]
gc.collect()
print "Length now", len(gc.garbage)
# Let's break it
for handle in gc.garbage:
handle.close()
# Now, our objects don't come back
del gc.garbage[:]
gc.collect()
print "Length after breaking circular dependencies", len(gc.garbage)
And the output shows that our circular dependency does prevent collection:
Length after work 999
Length now 999
Length after breaking circular dependencies 0
Upvotes: 2