Ben Usman
Ben Usman

Reputation: 8387

__deepcopy__ object with cyclic references

from copy import deepcopy

class DoubleLinkedListNeedsDeepCopy:
    def __init__(self, val, tail=None):
        self.val = val
        self._next_node = None
        self._tail = tail or self

    def append(self, new_val):
        next_node = type(self)(new_val, self._tail)
        self._next_node = next_node
        return next_node

    def __deepcopy__(self, memo):
        new_copy = type(self)(self.val)
        new_copy._next_node = deepcopy(self._next_node, memo)
        new_copy._tail = deepcopy(self._tail, memo)
        return new_copy

    @property
    def next(self):
        return self._next_node

    @property
    def tail(self):
        return self._tail

    @property
    def is_last(self):
        return self._next_node == None

linked_list = head = DoubleLinkedListNeedsDeepCopy(1)
for i in range(2, 5):
    head = head.append(i)

def print_list(linked_list):
    cur = linked_list
    for i in range(20):
        print(cur.val, end=' ')

        if cur.is_last:
            break
        else:
            cur = cur.next
    print()

import sys
sys.setrecursionlimit(10000)

print_list(linked_list)
linked_list.next.next.val = 5
print_list(linked_list)
list_copy = deepcopy(linked_list)
list_copy.next.next.val = 8
print_list(list_copy)
print_list(linked_list)

Expected output is:

1 2 3 4 
1 2 5 4 
1 2 8 4 
1 2 5 4 

However it fails with RecursionError after following a recursive path: linked_list.next.next.next.tail.next.next.next...

(that is of course a toy example, I need a complicated tree-like structure to be copied in real-life scenario)

Upvotes: 5

Views: 1466

Answers (2)

Grief
Grief

Reputation: 2040

While you've decided to avoid overriding __deepcopy__ at all, the actual question remains unanswered. I was googling for the solution but didn't find anything, so after some trial-and-error attempts I found the answer and I want to post it here.

The reason why the code you've written fails with the RecursionError is the order of execution. memo dictionary is updated only after __deepcopy__ returns. You can check that in the source code of copy.py. Here is the most important part of it without pieces, unnecessary for our case:

def deepcopy(x, memo=None, _nil=[]):
    ...

    if memo is None:
        memo = {}

    d = id(x)
    y = memo.get(d, _nil)
    if y is not _nil:
        return y

    ...

    copier = getattr(x, "__deepcopy__", None)
    if copier:
        y = copier(memo)

    ...

    # If is its own copy, don't memoize.
    if y is not x:
        memo[d] = y
        _keep_alive(x, memo) # Make sure x lives at least as long as d
    return y

So, our problem is that memo is not updated in __deepcopy__ before it calls another __deepcopy__ with the same argument. Knowing that, it's easy to fix your code with the single line of code:

def __deepcopy__(self, memo):
    new_copy = type(self)(self.val)

    memo[id(self)] = new_copy  # THIS LINE: update memo before re-entering deep-copy machinery

    new_copy._next_node = deepcopy(self._next_node, memo)
    new_copy._tail = deepcopy(self._tail, memo)
    return new_copy

Upvotes: 4

Ben Usman
Ben Usman

Reputation: 8387

It turned out that in most cases (if you don't need to explicitly exclude certain fields from your copy), you can just do deepcopy(obj) even if obj has self-links or other else nasty properties.

Upvotes: 1

Related Questions