wolf
wolf

Reputation: 63

Apparently inconsistent behavior when copying numpy arrays

Consider the following minimal example. Can somebody explain the apparently inconsistent logic of numpy when it comes to copying list elements of varying nesting depths?

import numpy as np

L = [[[[1, 1], 2, 3]]]
A1 = np.array(L)

A2 = A1.copy()

A1[0][0][2] = 'xx'
A1[0][0][0][0] = 'yy'

print "\nA1 after changes:\n{}".format(A1)
print "\nA2 only partially changed:\n{}".format(A2)

Results:

A1 after changes:
[[[['yy', 1] 2 'xx']]]

A2 only partially changed:
[[[['yy', 1] 2 3]]]

Then:

>>> print A1[0][0][2] == A2[0][0][2]
False
>>> print A1[0][0][0][0] == A2[0][0][0][0]
True

I have a hard time explaining to myself why 3 is not replaced, but 1 in a deeper level is.

  1. A2 = np.array(A, copy=True) and A2 = np.empty_like(A); np.copyto(A4, A) behave the same as the code above

  2. A2 = A[:] behaves the same as A2 = A: both are identical after changes

  3. import copy; A2 = copy.deepcopy(A) is the only solution I found to create an independent copy.

Upvotes: 0

Views: 214

Answers (1)

hpaulj
hpaulj

Reputation: 231385

Look at your array, and understand its structure first:

In [139]: A1
Out[139]: array([[[[1, 1], 2, 3]]], dtype=object)

In [140]: A1.shape
Out[140]: (1, 1, 3)

It's a dtype=object array; that is the elements are object pointers, not numbers. Also it is 3d, with 3 elements.

In [142]: A1[0,0]  
Out[142]: array([[1, 1], 2, 3], dtype=object)

Since it is an array, A1[0,0] is better than A1[0][0]. Functionally the same, but clearer. A1[0,0,:] is even better. Anyways, at this level we still have an array with shape (3,), i.e. 1d with 3 elements.

In [143]: A1[0,0,0]
Out[143]: [1, 1]

In [144]: A1[0,0,2]
Out[144]: 3

Now we get a list and numbers, the individual elements of A1. The list is mutable, the number is not.

We can change the 3rd element (a number) to a string:

In [148]: A1[0,0,2]='xy'

To change an element of the 1st element, a list, I have to use the mixed indexing, not a 4 level array indexing.

In [149]: A1[0,0,0,0]
...
IndexError: too many indices for array

In [150]: A1[0,0,0][0]='yy'

In [151]: A1
Out[151]: array([[[['yy', 1], 2, 'xy']]], dtype=object)

A1 is still a 3d object array; we have just change a couple of elements. The 'xy' change is different from the 'yy' change. One changed the array, the other a list element of the array.

A2=A1.copy() makes a new array with copies of the elements (the data buffer) of A1. So A2 has pointers to the same objects as A1.

The 'xy' changed the pointer in A1, but did not change the A2 copy.

The 'yy' change modified the list pointed to by A1. A2 has a pointer to the same list, so it sees the change.

Note that L, the original nested list sees the same change:

In [152]: L
Out[152]: [[[['yy', 1], 2, 3]]]

A3 = A[:] produces a view of A1. A3 has the same data buffer as A1, so it sees all the changes.

A4 = A would also see the same changes, but A4 is a new reference to A1, not a view or a copy.

The duplicate answer that was raised earlier dealt with references, copies and deep copies of lists. That is relevant here because L is a list, and A1 is an object array, which in many ways is an array wrapper around a list. But A1 is also numpy array, which has the added distinction between view and copy.

This is not a good use of numpy arrays, not even the object dtype version. It's an instructive example, but too confusing to be practical. If you need to do a deepcopy on an array, you probably are using arrays wrong.

Upvotes: 1

Related Questions