RunOrVeith
RunOrVeith

Reputation: 4805

Is it possible to keep reference link between list and numpy array

If I create a list in python, and assign a different list to it, changes of the first list are reflected in the second list:

a = [1, 2, 3]
b = a
a[0] = 0
print(b)
>>> [0, 2, 3]

Is it possible to achieve this behavior when creating a numpy array from a list? What I want:

import numpy as np
a = [1, 2, 3]
b = np.array(a)
a[0] = 0
print(b)
>>> [ 0 2 3 ]

But what actually happens is that b is [ 1 2 3 ]. I realize that this is difficult due to dynamic resizing of the list. But if I could tell numpy that this list is never resized, it should work somehow. Is this behavior achievable? Or am I missing some really bad drawbacks?

Upvotes: 1

Views: 960

Answers (2)

juanpa.arrivillaga
juanpa.arrivillaga

Reputation: 95957

Fundamentally the issue is that Python lists are not really arrays. OK, CPython lists are ArrayLists, but they are arrays of Py_Object pointers, so they can hold heterogenous data. See here for an excellent exposition on the implementation details of CPython lists. Also, they are resizable, and all the malloc and realloc gets taken care of under the hood. However, you can achieve something like what you want if you use vanilla Python arrays available in the array module.

>>> import numpy as np # third party
>>> import array # standard library module

Let's make a real array:

>>> a = array.array('i', [1,2,3])
>>> a
array('i', [1, 2, 3])

We can use numpy.frombuffer if we want our np.array to share the underyling memory of the buffer:

>>> arr = np.frombuffer(a, dtype='int32')
>>> arr
array([1, 2, 3], dtype=int32)

EDIT: WARNING

As stated by @user2357112 in the comments:

Watch out - numpy.frombuffer is still using the old buffer protocol (or on Python 3, the compatibility functions that wrap the new buffer protocol in an old-style interface), so it's not very memory-safe. If you create a NumPy array from an array.array or bytearray with frombuffer, you must not change the size of the underlying array. Doing so risks arbitrary memory corruption and segfaults when you access the NumPy array

Note, I had to explicitly pass dtype='int32' because I initialized my array.array with the i signed int typecode, which on my system corresponds to a 32 bit int. Now, presto:

>>> a
array('i', [1, 2, 3])
>>> a[0] = 88
>>> a
array('i', [88, 2, 3])
>>> arr
array([88,  2,  3], dtype=int32)
>>>

Now, if we use dtype=object, we actually can share the underlying objects. However, with numerical types, we can't mutate, only replace. However, we can wrap a Python int in a class to make a mutable object:

>>> class MutableInt:
...     def __init__(self, val):
...         self.val = val
...     def __repr__(self):
...         return repr(self.val)
...
>>> obj_list = [MutableInt(i) for i in range(1, 8)]
>>> obj_list
[1, 2, 3, 4, 5, 6, 7]

Now, we create an array that consists of the same objects:

>>> obj_array = np.array(obj_list, dtype=object)
>>> obj_array
array([1, 2, 3, 4, 5, 6, 7], dtype=object)

Now, we can mutate the int wrapper in the list:

>>> obj_list[0].val = 88
>>> obj_list
[88, 2, 3, 4, 5, 6, 7]

And the effects are visible in the numpy array!:

>>> obj_array
array([88, 2, 3, 4, 5, 6, 7], dtype=object)

Note, though, you've now essentially created a less useful version of a Python list, one that isn't resizable, and doesn't have the nice O(1) amortized append behavior. We also lose any memory efficiency gains that a numpy array might give you!

Also, note that in the above the obj_list and obj_array are not sharing the same underlying buffer, they are making *two different arrays of holding the same Py_Obj pointer values:

>>> obj_list[1] = {}
>>> obj_array
array([88, 2, 3, 4, 5, 6, 7], dtype=object)
>>> obj_list
[88, {}, 3, 4, 5, 6, 7]
>>>

We cannot access the underlying buffer to a python list because this is not exposed. Theoretically, they could if they exposed the buffer protocol: https://docs.python.org/3/c-api/buffer.html#bufferobjects

But they don't. bytes and bytearray objects do expose the buffer protocol. bytes are essentially Python 2 str, and bytearray is a mutable version of bytes, so they are essentially mutable char arrays like in C:

>>> barr = bytearray([65, 66, 67, 68])
>>> barr
bytearray(b'ABCD')

Now, let's make a numpy array that shares the underlying buffer:

>>> byte_array = np.frombuffer(barr, dtype='int8')
>>> byte_array
array([65, 66, 67, 68], dtype=int8)

Now, we will see changes reflected across both objects:

>>> byte_array[1] = 98
>>> byte_array
array([65, 98, 67, 68], dtype=int8)
>>> barr
bytearray(b'AbCD')

Now, before you think you can use this to subvert the immutability of Python bytes objects, think again:

>>> bs = bytes([65, 66, 67, 68])
>>> bs
b'ABCD'
>>> byte_array = np.frombuffer(bs, dtype='int8')
>>> byte_array
array([65, 66, 67, 68], dtype=int8)
>>> bs
b'ABCD'
>>> byte_array[1] = 98
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
ValueError: assignment destination is read-only
>>>

Upvotes: 3

hpaulj
hpaulj

Reputation: 231385

a = [1,2,3]

A list contains pointers to objects (in this case integers) elsewhere in memory.

b = a

b points to the same list as a. It's just another name

c = a[:]

c is new, but it contains the same pointers as a

arr = np.array(a)

arr has the same numeric values as a, but it stores those values in its own databuffer. It has, in effect, evaluated a, and made a new object. There is no connection, other than by value, with a.

arr1 = arr[:]

a new array, but with a shared data buffer, a view.

arr2 = arr.copy()

a new array with its own data buffer.

arr[0]

is a number, created from arr. It's equal in value to a[0] but does not reference the same numeric object.

Upvotes: 0

Related Questions