user31039
user31039

Reputation: 6569

If I want non-recursive deep copy of my object, should I override copy or deepcopy in Python?

An object of my class has a list as its attribute. That is,

class T(object):
    def __init__(self, x, y):
        self.arr = [x, y]

When this object is copied, I want a separate list arr, but a shallow copy of the content of the list (e.g. x and y). Therefore I decide to implement my own copy method, which will recreate the list but not the items in it. But should I call this __copy__() or __deepcopy__()? Which one is the right name for what I do, according to Python semantics?

My guess is __copy__(). If I call deepcopy(), I would expect the clone to be totally decoupled from the original. However, the documentation says:

A deep copy constructs a new compound object and then, recursively, inserts copies into it of the objects found in the original.

It is confusing by saying "inserts copies into it" instead of "inserts deep copies into it", especially with the emphasis.

Upvotes: 7

Views: 987

Answers (3)

David Zwicker
David Zwicker

Reputation: 24298

I think overwriting __copy__ is a good idea, as explained by the other answers with great detail. An alternative solution might be to write an explicit copy method to be absolutely clear on what is called:

class T(object):
    def __init__(self, x, y):
        self.arr = [x, y]

    def copy(self):
        return T(*self.arr)

When any future reader sees t.copy() he then immediately knows that a custom copy method has been implemented. The disadvantage is of course that it might not play well with third-party libraries that use copy.copy(t).

Upvotes: 1

MSeifert
MSeifert

Reputation: 152667

It actually depends on the desired behaviour of your class that factor into the decision what to override (__copy__ or __deepcopy__).

In general copy.deepcopy works mostly correctly, it just copies everything (recursivly) so you only need to override it if there is some attribute that must not be copied (ever!).

On the other hand one should define __copy__ only if users (including yourself) wouldn't expect changes to propagate to copied instances. For example if you simply wrap a mutable type (like list) or use mutable types as implementation detail.

Then there is also the case that the minimal set of attributes to copy isn't clearly defined. In that case I would also override __copy__ but maybe raise a TypeError in there and possible include one (or several) dedicated public copy methods.

However in my opinion the arr counts as implementation detail and thus I would override __copy__:

class T(object):
    def __init__(self, x, y):
        self.arr = [x, y]

    def __copy__(self):
        new = self.__class__(*self.arr)
        # ... maybe other stuff
        return new

Just to show it works as expected:

from copy import copy, deepcopy

x = T([2], [3])
y = copy(x)
x.arr is y.arr        # False
x.arr[0] is y.arr[0]  # True
x.arr[1] is y.arr[1]  # True

x = T([2], [3])
y = deepcopy(x)
x.arr is y.arr        # False
x.arr[0] is y.arr[0]  # False
x.arr[1] is y.arr[1]  # False

Just a quick note about expectations:

Users generally expect that you can pass an instance to the constructor to create a minimal copy (similar or identical to __copy__) as well. For example:

lst1 = [1,2,3,4]
lst2 = list(lst1)
lst1 is lst2        # False

Some Python types have an explicit copy method, that (if present) should do the same as __copy__. That would allow to explicitly pass in parameters (however I haven't seen this in action yet):

lst3 = lst1.copy()  # python 3.x only (probably)
lst3 is lst1        # False

If your class should be used by others you probably need to consider these points, however if you only want to make your class work with copy.copy then just overwrite __copy__.

Upvotes: 3

wim
wim

Reputation: 362786

The correct magic method for you to implement here is __copy__.

The behaviour you've described is deeper than what the default behaviour of copy would do (i.e. for an object which hasn't bothered to implement __copy__), but it is not deep enough to be called a deepcopy. Therefore you should implement __copy__ to get the desired behaviour.

Do not make the mistake of thinking that simply assigning another name makes a "copy":

t1 = T('google.com', 123)
t2 = t1  # this does *not* use __copy__

That just binds another name to the same instance. Rather, the __copy__ method is hooked into by a function:

import copy
t2 = copy.copy(t1)

Upvotes: 5

Related Questions