Some confusions on how numpy array stored in Python

Question

I have some confusions when playing with data type numpy array in Python.

Question 1

I execute the following scripts in python intepreter

>>> import numpy as np
>>> L = [1000,2000,3000]
>>> A = np.array(L)
>>> B = A

Then I check the following things:

>>> A is B
True
>>> id(A) == id(B)
True
>>> id(A[0]) == id(B[0])
True

That's fine. But some strange things happened then.

>>> A[0] is B[0]
False

But how can A[0] and B[0] be different things? They have the same id! For List in python, we have

>>> LL = [1000,2000,3000]
>>> SS = LL
>>> LL[0] is SS[0]
True

The method to store numpy array is totally different with list? And we also have

>>> A[0] = 1001
>>> B[0]
1001

It seems that A[0] and B[0] is the identical objects.

Question2

I make a copy of A.

>>> C = A[:]
>>> C is A
False
>>> C[0] is A[0]
False

That is fine. A and C seem to be independent with each other. But

>>> A[0] = 1002
>>> C[0]
1002

It seems that A and C is not independent? I am totally confused.

Sven Marnach · Accepted Answer

You are asking two completely independent questions, so here's two answsers.

The data of Numpy arrays is internally stored as a contiguous C array. Each entry in the array is just a number. Python objects on the other hand require some housekeeping data, e.g. the reference count and a pointer to the type object. You can't simply have a raw pointer to a number in memory. For this reason, Numpy "boxes" a number in a Python object if you access an individual elemtent. This happens everytime you access an element, so even A[0] and A[0] are different objects:
```
>>> A[0] is A[0]
False
```
This is at the heart of why Numpy can store arrays in a more memory-efficient way: It does not store a full Python object for each entry, and only creates these objects on the fly when needed. It is optimised for vectorised operations on the array, not for individual element access.
When you execute C = A[:] you are creating a new view for the same data. You are not making a copy. You will then have two different wrapper objects, pointed to by A and C respectively, but they are backed by the same buffer. The base attribute of an array refers to the array object it was originally created from:
```
>>> A.base is None
True
>>> C.base is A
True
```
New views on the same data are particularly useful when combined with indexing, since you can get views that only include some slice of the original array, but are backed by the same memory.

To actually make a copy of an array, use the copy() method.

As a more general remark, you should not read too much into object identity in Python. In general, if x is y is true, you know that they are really the same object. However, if this returns false, they can still be two different proxies to the same object.

Some confusions on how numpy array stored in Python

Answers (1)

Related Questions