Are NumPy ndarrays truly mutable?

Question

Here is my first code snippet. When run, it doesn't throw an assertion error.

import numpy as np


this_arr = np.ones(10)

next_arr = this_arr

next_arr *= 2

assert np.array_equal(this_arr, next_arr)

Here is my second code snippet. When run, it does throw an assertion error.

import numpy as np


this_arr = np.ones(10)

next_arr = this_arr

next_arr = next_arr * 2

assert np.array_equal(this_arr, next_arr)

This behavior is confusing to me.

My understanding of the first code snippet is that I initialize the name this_arr to point to the value at some memory location. Then, when I initialize the name next_arr to point to the same value at the same memory location. Therefore, when I change the value pointed to by next_arr, the value pointed to by this_arr should also change. This behavior is the "Mutable-Presto-Chango," which was coined by Ned Batchelder here.

However, the second code snippet, does not behave this way. At first, I thought that maybe the *= operator somehow doesn't change the value's location in memory while the * operator does. But then I went back through the first snippet and found that the memory locations of this_arr and next_arr are different here too! Given that, how does the program "know" to change the values of this_arr to match those of the changed next_arr? Also, why doesn't the program "know" to change the values in the second code snippet?

Edit: As a followup question: So even though next_arr and this_arr have different memory locations, there is some underlying connection between the two that python has initialized?

Thanks!

hpaulj · Accepted Answer

I prefer to talk in terms of objects and references, rather than values. So I would describe your first code as:

This creates a ndarray object, and assigns it (or a reference to it) to this_arr:

this_arr = np.ones(10)

and assign the same reference to next_arr:

next_arr = this_arr

So next_arr and this_arr reference the same object.

Then do an 'in-place' change to the array object. It doesn't matter which name is used.

next_arr *= 2

The two names still reference the same array object. (under the covers does *= some buffering, but the array object and data buffer location remain the same). Another mutuable change would be next_arr[1] = 10 (this would true for list objects as well).

With

next_arr = next_arr * 2

the multiplication makes a new array object. That is assigned to next_arr, breaking any links with the previously reference object (which this_arr still references).

If id(this_arr) and id(next_arr) are the same, then the reference the object. Roughly the id is a location - but not the same as a pointer in c. But be wary about comparing the ids over time; they may be reused.

arr.__array_interface__ is another handy tool. If has a data key that tells us where the underlying data buffer of an array is located. But to understand that you need to know something about how arrays are stored, and the distinction between view and copy.

Are NumPy ndarrays truly mutable?

Answers (2)

Related Questions