lxy
lxy

Reputation: 459

numpy boolean indexing selecting and setting

I'm not very familiar with python. I reading the book 'Python for Data Analysis' recently, and I'm a bit confused about the numpy boolean indexing and setting. The book said:

Selecting data from an array by boolean indexing always creates a copy of the data, even if the returned array is unchanged.

Setting values with boolean arrays works in a common-sense way.

And I have tried it as the follow code:

First:

data = np.random.randn(7, 4) 
data[data < 0] = 0 # this could change the `data`

Second:

data = np.random.randn(7, 4) 
copied = data[data < 0]
copied[1] = 1  # this couldn't change the `data`

I do not quite understand here, anyone can explain it. In my understanding, copied should be pointer to the data[data < 0] slices.

Upvotes: 0

Views: 1857

Answers (3)

Paul Panzer
Paul Panzer

Reputation: 53089

As a rule of thumb numpy creates a view where possible and a copy where necessary.

When is a view possible? When the data can be addressed using strides, i.e. for example for a 2d array A each A[i, j] sits in memory at address base + i*stride[0] + j*stride[1]. If you create a subarray using just slices this will always be the case which is why you will get a view.

For logical and advanced indexing it will typically not be possible to find a base and strides which happen to address the right elements. Therefore these operations return a new array with data copied.

Upvotes: 3

Daniel F
Daniel F

Reputation: 14399

While data[data < 0] = 0 sorta looks like a view being set to 0, that's not what's actually happening. In reality, an ndarray followed by = calls __setitem__ which handles the piecewise assingment.

When the ndarray is on the other side of the =, __setitem__ isn't called and you assign a copy (as boolean indexing always does), which is independent of the original array.

Essentially:

foo[foo != bar] = bar                # calls __setitem__
foo[:2]         = bar                # calls __setitem__
bar             = foo[foo != bar]    # makes a copy
bar             = foo[:2]            # makes a view

Upvotes: 5

Bhargava Krishna
Bhargava Krishna

Reputation: 11

Based on the sequence of the code:

  1. data = np.random.randn(7, 4) : Thi step creates an array of size 7 by 4
  2. data[data < 0] = 0 : makes all the elements in data which are < 0 as 0
  3. copied = data[data < 0] : This step generates an empty array as there is no element in data which is < 0, because of step 4
  4. copied[1] = 1 : This step raises an error as copied is an empty array and thus index 1 does not exist

Upvotes: 1

Related Questions