Captain Trojan
Captain Trojan

Reputation: 2921

Does breaking a linked list help Python's garbage collector?

I was wondering whether — when I have the opportunity — I should break up a singly-connected linked list in Python when I don't need it anymore, or if I should not bother myself with it.

List example:

class Link:
    def __init__(self, next=None):
        self.next = next


L = Link(Link(Link(Link())))

Breaking the links:

def break_(link):
    if link is not None:
        break_(link.next)
        link.next = None  # link broken

# application
break_(L)

Tree example:

class Node:
    def __init__(self, l=None, r=None):
        self.l = l
        self.r = r

T = \
Node(
    Node(
        Node(), 
        Node()
    ),
    Node(
        Node(), 
        Node()
    ),
)

Breaking the links:

def break_(node):
    if node is not None:
        break_(node.l)
        break_(node.r)
        node.l = None  # link broken
        node.r = None  # link broken
        
# application
break_(T)

Basically, what I wonder is, what is the performance-wise optimal way to write your code in this situation. Is the GC prepared to deal with large linked structures? Doesn't it have to run potentially long DFSes with counters and such to determine which objects can be freed? Isn't it simpler to just break the links and give the GC a bunch of loose objects with zero references anywhere?

I could benchmark this, but I'm looking for an explanation (ideally from Karl), if it's available. Thank you.

Upvotes: 0

Views: 76

Answers (1)

a_guest
a_guest

Reputation: 36309

You can add a __del__ method to your class to see when the object is about to be cleaned up:

class Node:
    def __init__(self, name, l=None, r=None):
        self.name = name
        self.l = l
        self.r = r

    def __del__(self):
        print(f'__del__ {self.name}')


T = \
Node('0',
    Node('0L',
        Node('0LL'), 
        Node('0LR'),
    ),
    Node('0R',
        Node('0RL'), 
        Node('0RR'),
    ),
)
del T  # decreases the reference count of T by 1

What happens with del T is that the reference count of the object referred to by T is decreased by 1. In this case, it happens that the reference count reaches 0. CPython knows this because it stores a reference count together with each object. When the reference count of an object reaches 0, the object will be marked for cleanup. This cleanup happens before control is given back to user code. So for the above example, it means the object that T originally referred to is immediately cleaned up. This first invokes __del__ and then the corresponding C-code for cleanup. This in turn goes through the object's instance dict and decreases the reference count for each other object stored there by 1. If another object's reference count reaches 0 in that process, it is also marked for cleanup. Then this procedure repeats until no objects are marked for cleanup and control is given back to the main loop.

This is the output from the example:

__del__ 0
__del__ 0L
__del__ 0LL
__del__ 0LR
__del__ 0R
__del__ 0RL
__del__ 0RR

As mentioned above, when 0 is cleaned up, all other objects it references have their reference count reduced by 1 (i.e. 0L and 0R) and get cleaned up as well if the ref count reaches 0.

If you manually break the links of the list by setting the corresponding instance attribute to None this adds additional overhead, as it has to actually modify all of these objects in memory (updating their instance dict). The reference count of the child objects reaching 0 is more of a byproduct here. Also note that the order of cleanup is changed since the clean function goes depth first:

def clean(node):
    if node is not None:
        clean(node.l)
        clean(node.r)
        node.l = None  # this updates the object in memory
        node.r = None

clean(T)

produces

__del__ 0LL
__del__ 0LR
__del__ 0RL
__del__ 0RR
__del__ 0L
__del__ 0R
__del__ 0

Upvotes: 1

Related Questions