Reputation: 73

Why does insert and append for numpy ndarray return a new array instead of modifying the original array?

For numpy ndarray, there are no append, and insert as there are for native python lists.

a = np.array([1, 2, 3])
a.append(5)  # this does not work
a = np.append(a, 5)  # this is the only way

Whereas for native python lists,

a = [1, 2, 3]
a.append(4)  # this modifies a
a  # [1, 2, 3, 4]

Why was numpy ndarray designed to be this way? I'm writing a subclass of ndarray, is there any way of implementing "append" like native python arrays?

Upvotes: 4

Answers (2)

hpaulj

Reputation: 231615

ndarray is created with a fixed size databuffer - just big enough to hold the bytes representing the elements.

arr.nbytes == arr.itemsize * arr.size

arr.resize can change the array inplace. But read it's docs to see the limitations, especially about owning its own data. It's one of the few inplace operations, and not used that often.

In contrast a Python list stores object pointers in a buffer. The buffer has some growth room allowing for efficient append. It just has to add a new pointer to the buffer. When the buffer fills up, it allocates a new larger buffer and copies the pointers.

For a 1d array the buffers for ndarray and list will be similar, at least for 4 or 8 bytes numeric dtypes. But for multidimensional arrays, the databuffer can be very large (the product of all dimensions), while the top buffer of an equivalent nested array just contains pointers to the outer layer of lists (the 'rows').

Object dtype arrays store pointers like a list, but the databuffer still has the fixed size (no growth space). Performance lies between numeric arrays and lists.

I can imagine writing an inplace append that uses the resize method, followed by copying the new value(s) to the 0 fills.

In [96]: arr = np.array([[1,3],[2,7]])
In [97]: arr.resize(3,2)
In [98]: arr
Out[98]: 
array([[1, 3],
       [2, 7],
       [0, 0]])
In [99]: arr[-1,:] = 10,11
In [100]: arr
Out[100]: 
array([[ 1,  3],
       [ 2,  7],
       [10, 11]])

But notice what happens to values when we resize an inner axis:

In [101]: arr = np.array([[1,3],[2,7]])
In [102]: arr.resize(2,3)
In [103]: arr
Out[103]: 
array([[1, 3, 2],
       [7, 0, 0]])

So this kind of append is quite limited compared to concatenate (and all of its 'stack' derivatives).

Have you looked at the code for np.append? After making sure the arguments are arrays, and tweaking their shapes, it does:

concatenate((arr, values), axis=axis)

In other words, it is just an alternative way of calling concatenate. It's probably best for adding a single value to a 1d array. It shouldn't be used repeatedly in a loop, precisely because it returns a new array, and thus is relatively expensive. Otherwise its use messes up many users. Some ignore the axis parameter. Others have problems creating a correct 'empty' array to start with. Concatenate also has those problems, but at least users have to consciously deal the issue of matching shapes.

np.insert is much more complicated. It does different things depending on whether the indices (obj) is a number, slice or list of numbers. One approach is to create a target array of the right size, and copy slices from the original and insert values to the right slots. Another is to use a boolean mask to copy values into the right locations. Both have to accommodate multidimensions - it inserts along one axis, but must use the appropriate slice(None) for the other dimensions. This is much more complicated than the list insert, which inserts one object (pointer) at one location in 1d.

Upvotes: 1

user2357112

Reputation: 281683

NumPy makes heavy use of views, a feature that Python lists do not support. A view is an array that uses the memory of another object rather than owning its own memory; for example, in the following snippet

a = numpy.arange(5)
b = a[1:3]

b is a view of a.

Views would interact very poorly with an in-place append or other in-place size-changing operations. Arrays would suddenly not be views of arrays they should be views of, or they would be views of deallocated memory, or it would be unpredictable whether an append on one array would affect an array it was a view of, or all sorts of other problems. For example, what would a look like after b.append(6)? Or what would b look like after a.clear()? And what kind of performance guarantees could you make? Probably not the amortized constant time guarantee of list.append.

If you want to append, you probably shouldn't be using NumPy arrays; you should use a list, and build an array from the list when you're done appending.

Upvotes: 6

Why does insert and append for numpy ndarray return a new array instead of modifying the original array?

Answers (2)

Related Questions