Reputation: 4805
If I create a list in python, and assign a different list to it, changes of the first list are reflected in the second list:
a = [1, 2, 3]
b = a
a[0] = 0
print(b)
>>> [0, 2, 3]
Is it possible to achieve this behavior when creating a numpy array from a list? What I want:
import numpy as np
a = [1, 2, 3]
b = np.array(a)
a[0] = 0
print(b)
>>> [ 0 2 3 ]
But what actually happens is that b is [ 1 2 3 ]
. I realize that this is difficult due to dynamic resizing of the list. But if I could tell numpy that this list is never resized, it should work somehow.
Is this behavior achievable? Or am I missing some really bad drawbacks?
Upvotes: 1
Views: 960
Reputation: 95957
Fundamentally the issue is that Python lists are not really arrays. OK, CPython lists are ArrayLists, but they are arrays of Py_Object pointers, so they can hold heterogenous data. See here for an excellent exposition on the implementation details of CPython lists. Also, they are resizable, and all the malloc
and realloc
gets taken care of under the hood. However, you can achieve something like what you want if you use vanilla Python arrays available in the array
module.
>>> import numpy as np # third party
>>> import array # standard library module
Let's make a real array:
>>> a = array.array('i', [1,2,3])
>>> a
array('i', [1, 2, 3])
We can use numpy.frombuffer
if we want our np.array
to share the underyling memory of the buffer:
>>> arr = np.frombuffer(a, dtype='int32')
>>> arr
array([1, 2, 3], dtype=int32)
As stated by @user2357112 in the comments:
Watch out -
numpy.frombuffer
is still using the old buffer protocol (or on Python 3, the compatibility functions that wrap the new buffer protocol in an old-style interface), so it's not very memory-safe. If you create a NumPy array from anarray.array
orbytearray
withfrombuffer
, you must not change the size of the underlying array. Doing so risks arbitrary memory corruption and segfaults when you access the NumPy array
Note, I had to explicitly pass dtype='int32'
because I initialized my array.array
with the i
signed int typecode, which on my system corresponds to a 32 bit int. Now, presto:
>>> a
array('i', [1, 2, 3])
>>> a[0] = 88
>>> a
array('i', [88, 2, 3])
>>> arr
array([88, 2, 3], dtype=int32)
>>>
Now, if we use dtype=object
, we actually can share the underlying objects. However, with numerical types, we can't mutate, only replace. However, we can wrap a Python int
in a class to make a mutable object:
>>> class MutableInt:
... def __init__(self, val):
... self.val = val
... def __repr__(self):
... return repr(self.val)
...
>>> obj_list = [MutableInt(i) for i in range(1, 8)]
>>> obj_list
[1, 2, 3, 4, 5, 6, 7]
Now, we create an array that consists of the same objects:
>>> obj_array = np.array(obj_list, dtype=object)
>>> obj_array
array([1, 2, 3, 4, 5, 6, 7], dtype=object)
Now, we can mutate the int wrapper in the list:
>>> obj_list[0].val = 88
>>> obj_list
[88, 2, 3, 4, 5, 6, 7]
And the effects are visible in the numpy
array!:
>>> obj_array
array([88, 2, 3, 4, 5, 6, 7], dtype=object)
Note, though, you've now essentially created a less useful version of a Python list
, one that isn't resizable, and doesn't have the nice O(1) amortized append
behavior. We also lose any memory efficiency gains that a numpy
array might give you!
Also, note that in the above the obj_list
and obj_array
are not sharing the same underlying buffer, they are making *two different arrays of holding the same Py_Obj pointer values:
>>> obj_list[1] = {}
>>> obj_array
array([88, 2, 3, 4, 5, 6, 7], dtype=object)
>>> obj_list
[88, {}, 3, 4, 5, 6, 7]
>>>
We cannot access the underlying buffer to a python list
because this is not exposed. Theoretically, they could if they exposed the buffer protocol: https://docs.python.org/3/c-api/buffer.html#bufferobjects
But they don't. bytes
and bytearray
objects do expose the buffer protocol. bytes
are essentially Python 2 str
, and bytearray
is a mutable version of bytes
, so they are essentially mutable char
arrays like in C:
>>> barr = bytearray([65, 66, 67, 68])
>>> barr
bytearray(b'ABCD')
Now, let's make a numpy
array that shares the underlying buffer:
>>> byte_array = np.frombuffer(barr, dtype='int8')
>>> byte_array
array([65, 66, 67, 68], dtype=int8)
Now, we will see changes reflected across both objects:
>>> byte_array[1] = 98
>>> byte_array
array([65, 98, 67, 68], dtype=int8)
>>> barr
bytearray(b'AbCD')
Now, before you think you can use this to subvert the immutability of Python bytes
objects, think again:
>>> bs = bytes([65, 66, 67, 68])
>>> bs
b'ABCD'
>>> byte_array = np.frombuffer(bs, dtype='int8')
>>> byte_array
array([65, 66, 67, 68], dtype=int8)
>>> bs
b'ABCD'
>>> byte_array[1] = 98
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
ValueError: assignment destination is read-only
>>>
Upvotes: 3
Reputation: 231385
a = [1,2,3]
A list contains pointers to objects (in this case integers) elsewhere in memory.
b = a
b
points to the same list as a
. It's just another name
c = a[:]
c
is new, but it contains the same pointers as a
arr = np.array(a)
arr
has the same numeric values as a
, but it stores those values in its own databuffer. It has, in effect, evaluated a
, and made a new object. There is no connection, other than by value, with a
.
arr1 = arr[:]
a new array, but with a shared data buffer, a view
.
arr2 = arr.copy()
a new array with its own data buffer.
arr[0]
is a number, created from arr
. It's equal in value to a[0]
but does not reference the same numeric object.
Upvotes: 0