Reputation: 1073

Strange behavior in np.ndarray` "is"

"is" built-in operator shows a strange behavior for the element in np.ndarray.

Although the id of the rhs and the lhs is the same, the "is" operator returns False (this behavior is specific to np.ndarray).

a = np.array([1.,])
b = a.view()
print(id(a[0] == id(b[0])))  # True
print(a[0] is b[0])  # False

This strange behavior even happens without the copy of view.

a = np.array([1.,])
print(a[0] is a[0])  # False

Does anyone know the mechanism of this strange behavior (and possibly the evidence or specification)?

Post Script: Please re-think the two examples.

If this is a list, this phenomenon is not observed.

a = [0., 1., 2.,]
b = []
b.append(a[0])
print(a[0] is b[0])  # True

a[0] and b[0] refer the exact same object.

a = np.array([1.,])
b = a.view()
b[0] = 0.
print(a[0])  # 0.0
print(id(a[0]) == id(b[0]))  # True

Note: This question can be a duplication, but I'm still a bit confused.

a = np.array([1.,])
b = a.view()
x = a[0]
y = b[0]
print(id(a[0]))  # 139746064667728
print(id(b[0]))  # 139746064667728
print(id(a[0]) == id(b[0])) # True
print(id(a[0]) == id(x)) # False
print(id(x) == id(y))  # False

Is a[0] a temporal object?
Is the id for a temporal object reused?
Doesn't it contradict to the specification? (https://docs.python.org/3.7/reference/expressions.html#is)

6.10.3. Identity comparisons
The operators is and is not test for object identity: x is y is true if and only if x and y are the same object. Object identity is determined using the id() function. x is not y yields the inverse truth value.

If the id is re-used for the temporal objects, why in this case the id is different?

>>> id(100000000000000000 + 1) == id(100000000000000001)
True
>>> id(100000000000000000 + 1) == id(100000000000000000)
False

Upvotes: 3

Answers (5)

hpaulj

Reputation: 231475

A big part of the confusion here is the nature of a[0] in the case of an array.

For a list, b[0] is an actual element of b. We can illustrate this by making a list of mutable items (other lists):

In [22]: b = [[0],[1],[2],[3]]
In [23]: b1 = b[0]
In [24]: b1
Out[24]: [0]
In [25]: b[0].append(10)
In [26]: b
Out[26]: [[0, 10], [1], [2], [3]]
In [27]: b1
Out[27]: [0, 10]
In [28]: b1.append(20)
In [29]: b
Out[29]: [[0, 10, 20], [1], [2], [3]]

Mutating b[0] and b1 act on the same object.

For an array:

In [35]: a = np.array([0,1,2,3])
In [36]: c = a.view()
In [37]: a1 = a[0]
In [38]: a += 1
In [39]: a
Out[39]: array([1, 2, 3, 4])
In [40]: c
Out[40]: array([1, 2, 3, 4])
In [41]: a1
Out[41]: 0

an inplace change in a does not change a1, even though it did change c.

__array_interface__ shows us where the databuffer for an array is stored - think of it, in a loose sense, as the memory address of that buffer.

In [42]: a.__array_interface__['data']
Out[42]: (31233216, False)
In [43]: c.__array_interface__['data']
Out[43]: (31233216, False)
In [44]: a1.__array_interface__['data']
Out[44]: (28513712, False)

The view has the same databuffer. But a1 does not. a[0:1] is a single element view of a, and does share the data buffer.

In [45]: a[0:1].__array_interface__['data']
Out[45]: (31233216, False)
In [46]: a[1:2].__array_interface__['data']  # 8 bytes over
Out[46]: (31233224, False)

So id(a[0]) tells us next to nothing about a. Comparing ids only tells us something about how memory slots are recycled, or not, when constructing Python objects.

Upvotes: 0

Eric

Reputation: 6066

Numpy stores array data as a raw data buffer. When you access the data like a[0], it reads from the buffer and constructs a python object for it. Thus, calling a[0] twice will construct 2 python objects. is checks for identity, so 2 different objects will compare false.

This illustration should make the process much clearer:

NOTE: id numbers are sequential to be used simply as examples. clearly you'd get a random like number. The multiple id 3s in the example also may not necessarily always be the same number. It's just possible that they are, because id 3 is repeatedly freed and thus reusable.

a = np.array([1.,])
b = a.view()
x = a[0]    # python reads a[0], creates new object id 1.
y = b[0]    # python reads b[0] which reads a[0], creates new object id 2. (1 is used by object x)

print(id(a[0]))  # python reads a[0], creates new object id 3.
                 # After this call, the object id 3 a[0] is no longer used.
                 # Its lifetime has ended and id 3 is freed.

print(id(b[0]))  # python reads b[0] which reads a[0], creates new object id 3. 
                 # id 3 has been freed and is reusable.
                 # After this call, the object id 3 b[0] is no longer used.
                 # Its lifetime has ended and id 3 is freed (again).

print(id(a[0]) == id(b[0])) # This runs in 2 steps.
                            # First id(a[0]) is run. This is just like above, creates a object with id 3.
                            # Then a[0] is disposed of since no references are created to it. id 3 is freed again.
                            # Then id(b[0]) is run. Again, it creates a object with id 3. (Since id 3 is free).
                            # So, id(a[0]) == 3, id(b[0]) == 3. They are equal.

print(id(a[0]) == id(x)) # Following the same thing above, id(a[0]) can create a object of id 3, x maintains its reference to id 1 object. 3 != 1.

print(id(x) == id(y))  # x references id 1 object, y references id 2 object. 1 != 2

Regarding

>>> id(100000000000000000 + 1) == id(100000000000000001)
True
>>> id(100000000000000000 + 1) == id(100000000000000000)
False

id allocation, and garbage collection are implementation details. What is guaranteed, is that, at a single point in time, references to 2 different objects are different and references to 2 identical objects are the same. The problem is that some expressions may not be atomic (i.e. not run at a single point in time).

Python may decide to reuse or not to reuse freed id numbers as it wishes, depending on the implementation. In this case, it decided to reuse in one case and not in the other. (it's likely that in the id(100000000000000000 + 1) == id(100000000000000001) python realises that since the number is the same, it can reuse it efficiently because 100000000000000001 would be in the same location in memory.)

Upvotes: 0

Inder

Reputation: 3826

This is simply due to the difference in how the is and == works , the is operator doesn't compare the values they simply check if the two operands refer to the same object or not.

For example if you do:

print(a is a)

The output will be: True for more information look up here

When python compares it allocates different positions to the operands and the same behaviour can be observed with a simple test using an id function.

print(id(a[0]),a[0] is a[0],id(a[0]))

The output will be:

140296834593128 False 140296834593248

The answer to the question that you are asking in addition that why lists don't behave the way numpy arrays behave is simply based on their construction. Np.arrays were designed to be more efficient in their processing capabilities and more efficient in their storage than a normal python list.

So every-time you load or perform an operation on a numpy array it is loaded and assigned a different id as you can observe from the following code:

a = np.array([0., 1., 2.,])
b = []
b.append(a[0])
print(id(a[0]),a[0] is b[0],id(b[0]))

Here are the outputs of multiple re-runs of the same code in jupyter-lab:

140296834595096 False 140296834594496
140296834595120 False 140296834594496
140296834595120 False 140296834594496
140296834595216 False 140296834594496
140296834595288 False 140296834594496

Notice something strange?, The ids of the numpy array with each re-run is different however the id for the list object remains the same. This explains the strange behaviour for numpy arrays in your question.

If you want to read more on this behaviour I will suggest numpy docs

Upvotes: 3

ivan_pozdeev

Reputation: 36046

This is covered by id() vs `is` operator. Is it safe to compare `id`s? Does the same `id` mean the same object? . In this particular case:

a[0] and b[0] are created anew each time
```
In [7]: a[0] is a[0]
Out[7]: False
```
In id(a[0]) == id(b[0]), each object is immediately discarded after taking its id, and b[0] just happened to take up the id of the recently-discarded a[0]. Even if this happens each time in your version of CPython for this particular expression (due to a specific evaluation order and heap organization), this is an implementation detail and you can't rely on it.

Upvotes: 1

xashru

Reputation: 3580

a[0] is of type <class 'numpy.float64'>. When you do the comparison it crates two instances of the class, so the is check fails. However if you do the following you will get what you wanted, because now both are referencing the same object.

x = a[0]
print(x is x)  # True

Upvotes: 1

Strange behavior in np.ndarray` &quot;is&quot;

Answers (5)

Related Questions

Strange behavior in np.ndarray` "is"