When should I use .copy()

Question

I know that for the following:

a=1
b=a
a=4

It assigns 1 to a and then a to b followed by changing the value of a to 4 as the last step.
Here once the value of a is changed to 4, there will not be any changes to the value of b.

Similarly for

a=np.array([10,20,30,40,50,60])
b=a[2]

a[2]=100

the value of b at the end of the code will be 20 which was the initial a[2].

But when you have some thing as given bellow:

a=np.array([10,20,30,40,50,60])
b=a[0:2]

and if we change a[0:2]=[100,200] then the value of b will also change automatically. It is like the variable b is linked somehow other ( which was different to the previous cases).
And I do know that if the code is written as b=a[0:2].copy() then the b will not change even though a is changed.

So my question is where exactly should I be using this .copy() method if I don't want to change the latter variable because it wasn't necessary in the first two cases

Appreciate your help

Pierre D · Accepted Answer

In the first & second cases, you have assigned a single value (scalar) to b.

In the third case, you've assigned a view based on the slice 0:2 of a to b (see basic-slicing-and-indexing). That is essentially a multivalued reference to certain elements of a (i.e.: b[0] really refers to a[0]). In other words, b does not own its data, but points to data in a. Thus, when changing a, b just reflects those changes because it points to the very same elements.

In general, you should use numpy copy() when you want to dissociate the copy from the original, for instance:

when you are about to make modifications to the copy (not cool to sneakily mutate the caller's data, for example),
when you want to keep a snapshot of the array at that point, no matter what happens to the original later,
when you have very sparse slicing or in a random order, and you want to get a more compact copy for faster operations (locality of reference),
when you'd like to change the flags of a or b (e.g. from C-continuous to F-continuous, again usually for speed reasons).

With regard to your question in comments "is there any way to to identify whether the variables are linked or not": yes. Each numpy array has flags:

a=np.array([10,20,30,40,50,60])
b=a[2]
b.flags.owndata
# True  -- funny given that b is a single numpy.int64, but it still has flags

b=a[0:2]
b.flags.owndata
# False

a=np.array([10,20,30,40,50,60])
b=a[0:2].copy()
b.flags.owndata
# True

When should I use .copy()

Answers (1)

Related Questions