Reputation: 154
I know the difference between shallow and deep copy in Python, and the question is not about when to use one or the other. However I find this trivial example pretty amusing and non-intuitive
from copy import deepcopy
a=0
b=deepcopy(a)
c=a
a+=1
print(a,b,c)
output: 1 0 0
from copy import deepcopy
a=[0]
b=deepcopy(a)
c=a
a[0]+=1
print(a,b,c)
output: [1] 0 [1]
I would like to know the reason why this design choice was made, since the two snippets of code are in my opinion quite equivalent however their output is completely different. To make myself more explicit I wonder why = is a deepcopy in the case of a "primitive" variable and a shallow copy in the case of a "non-primitive" (but still part of the basic language) variable like a list? I personally find this behavior counter-intuitive Note: I used python 3
Upvotes: 2
Views: 1192
Reputation: 5965
The catch here is mutability and immutability.
There is no such thing is primitive and non primitive in python, everything is a type, some are just inbuilt.
You need to understand how python stores data in variables. Assuming you come from a C background, you can think of all python variables are pointers.
All python variables store the reference to the location where the value of the variable actually is.
The builtin id
function somewhat lets us look into where the value of a variable is actually stored.
>>> x = 12345678
>>> id(x)
1886797010128
>>> y = x
>>> id(y)
1886797010128
>>> y += 1
>>> y
12345679
>>> x
12345678
>>> id(y)
1886794729648
The variable x
points to location 1886797010128
and location 1886797010128
holds the value of 10
. int
is an immutable type in python, which means that the data stored in the location 1886797010128
can't be changed.
When we assign y = x
, y
now also points to the same address, since it's not necessary to allocate more memory for the same value.
When y
is changed (remember that int
is an immutable type and it's value can't be changed), a new int is created in the new location 1886794729648
and y
now points to this new int object at the new address.
The same happens when you try to update the value of a variable that holds immutable data.
>>> id(x)
140707671077744
>>> x = 30
>>> id(x)
140707671078064
Changing the value of a variable that has immutable data simply makes the variable point to a new object with the updated value.
This is not the case with mutable types like list
.
>>> a = [1, 2, 3]
>>> b = a
>>> id(a), id(b)
(1886794896456, 1886794896456)
>>> b.append(4)
>>> a
[1, 2, 3, 4]
>>> b
[1, 2, 3, 4]
>>>
a
is a list
and is mutable, changing it using methods like append
will actually mutate the value at address 1886794896456
. Since b
also points to the same address, the value of a
also gets updated.
A deepcopy
creates a new object at a different memory location with the same value as its parameter, i.e. the object passed to it.
I would like to know the reason why this design choice was made
This is simply because of how python is designed as an object oriented language. Similar behavior can be seen in java objects.
I personally find this behavior counter-intuitive
Intuition comes from practice. Practicing one language cannot help with how other languages work, there are different design patterns and conventions for different languages and I think some amount of effort should be put in to learn what they are for the language we are about to use.
Upvotes: 4
Reputation: 280973
c = a
is neither a shallow copy nor a deep copy, no matter what a
refers to. It's even shallower than that - it only copies a reference. Both c
and a
hold references to the same object after this assignment.
It is not possible to modify the value of an int in Python. When you use +=
on an int, Python assigns a (reference to a) new int to wherever you retrieved the original int from.
For the first case, a += 1
reassigns the a
variable, while b
and c
continue to refer to the ints they referred to before the assignment.
For the second case, a[0] += 1
reassigns cell 0 of the list a
refers to. b
continues to refer to the copy, which is unchanged, and c
continues to refer to the same list a
refers to. Since this list has changed state, the change is visible through the c
variable.
Incidentally, deepcopy
is designed to produce a deep copy, in the sense that (arbitrarily deep) modifications to the return value will not modify the argument, and vice versa. Since it is not possible to modify the value of an int in Python, an int counts as a (deep) copy of itself, and indeed, the deepcopy
implementation simply returns its argument if its argument is an int.
>>> x = 1000
>>> copy.deepcopy(x) is x
True
Upvotes: 1
Reputation: 2453
It is commonplace to link objects between them during copies and not primitives.
The difference between your snippets is that, in the second one, c is a copy of the list a, and a list is an object, so they are linked. Whereas c was a "copy" of a primitive in the first snippet, which doesn't link.
Upvotes: -1