ZK Zhao
ZK Zhao

Reputation: 21533

NumPy: Why the need to explicitly copy a value?

arr = np.arange(0,11)
slice_of_arr = arr[0:6]
slice_of_arr[:]=99

# slice_of_arr returns
array([99, 99, 99, 99, 99, 99])
# arr returns
array([99, 99, 99, 99, 99, 99,  6,  7,  8,  9, 10])

As the example shown above, you cannot directly change the value of the slice_of_arr, because it's a view of arr, not a new variable.

My questions are:

  1. Why does NumPy design like this? Wouldn't it be tedious every time you need to .copy and then assign value?
  2. Is there anything I can do, to get rid of the .copy? How can I change this default behavior of NumPy?

Upvotes: 8

Views: 932

Answers (2)

J. Martinot-Lagarde
J. Martinot-Lagarde

Reputation: 3550

I think you have the answers in the other comments, but more specifically:

1.a. Why does NumPy design like this?
Because it's way faster (constant time) to create a view rather than creating a whole array (linear time).

1.b. Wouldn't it be tedious every time you need to .copy and then assign value?
Actually it's not that common to need to create a copy. So no, it's not tedious. Even if it can be surprising at first this design is very good.

2.a. Is there anything I can do, to get rid of the .copy?
I can't really tell without seing real code. In the toy example you give, you can't avoid creating a copy, but in real code you usually apply functions to the data, which return another array so a copy isn't needed.
Can you give an example of real code where you need to call .copy repeatedly ?

2.b. How can I change this default behavior of NumPy?
You can't. Try to get used to it, you'll see how powerfull it is.

Upvotes: 3

hpaulj
hpaulj

Reputation: 231385

What does (numpy) __array_wrap__ do?

talks about ndarray subclasses and hooks like __array_wrap__. np.array takes copy parameter, forcing the result to be a copy, even if it isn't required by other considerations. ravel returns a view, flatten a copy. So it is probably possible, and maybe not too difficult, to construct a ndarray subclass that forces a copy. It may involve modifying a hook like __array_wrap__.

Or maybe modifying the .__getitem__ method. Indexing as in slice_of_arr = arr[0:6] involves a call to __getitem__. For ndarray this is compiled, but for a masked array, it is python code that you could use as an example:

/usr/lib/python3/dist-packages/numpy/ma/core.py

It may be something as simple as

def __getitem__(self, indx):
    """x.__getitem__(y) <==> x[y]
    """
    # _data = ndarray.view(self, ndarray) # change to:
    _data = ndarray.copy(self, ndarray)
    dout = ndarray.__getitem__(_data, indx)
    return dout

But I suspect that by the time you develop and fully test such a subclass, you might fall in love with the default no-copy approach. While this view-v-copy business bites many new comers (especially if coming from MATLAB), I haven't seen complaints from experienced users. Look at other numpy SO questions; you won't see a lot copy() calls.

Even regular Python users are used asking themselves whether a reference or slice is a copy or not, and whether something is mutable or not.

for example with lists:

In [754]: ll=[1,2,[3,4,5],6]
In [755]: llslice=ll[1:-1]
In [756]: llslice[1][1:2]=[10,11,12]
In [757]: ll
Out[757]: [1, 2, [3, 10, 11, 12, 5], 6]

modifying an item an item inside a slice modifies that same item in the original list. In contrast to numpy, a list slice is a copy. But it's a shallow copy. You have to take extra effort to make a deep copy (import copy).

/usr/lib/python3/dist-packages/numpy/lib/index_tricks.py contains some indexing functions aimed at making certain indexing operations more convenient. Several are actually classes, or class instances, with custom __getitem__ methods. They may also serve as models of how to customize your slicing and indexing.

Upvotes: 1

Related Questions