Dieter Weber
Dieter Weber

Reputation: 543

How to prevent accidental assignment into empty NumPy views

Consider the following Python + NumPy code that executes without error:

a = np.array((1, 2, 3))

a[13:17] = 23

Using a slice beyond the limits of the array truncates the slice and even returns an empty view if start and stop are beyond the limits. Assigning to such a slice just drops the input.

In my use case the indices are calculated in a non-trivial way and are used to manipulate selected parts of an array. The above behavior means that I might silently skip parts of that manipultion if the indices are miscalculated. That can be hard to detect and can lead to "almost correct" results, i.e. the worst kind of programming errors.

For that reason I'd like to have strict checking for slices so that a start or stop outside the array bounds triggers an error. Is there a way to enable that in NumPy?

As additional information, the arrays are large and the operation is performed very often, i.e. there should be no performance penalty. Furthermore, the arrays are often multidimensional, including multidimensional slicing.

Upvotes: 4

Views: 143

Answers (3)

loopy walt
loopy walt

Reputation: 968

Depending on how complicated your indices are (read: how much pain in the backside it is to predict shapes after slicing), you may want to compute the expected shape directly and then reshape to it. If the size of your actual sliced array doesn't match this will raise an error. Overhead is minor:

import numpy as np
from timeit import timeit


def use_reshape(a,idx,val):
    expected_shape = ((s.stop-s.start-1)//(s.step or 1) + 1 if isinstance(s,slice) else 1 for s in idx)
    a[idx].reshape(*expected_shape)[...] = val

def no_check(a,idx,val):
    a[idx] = val
    
val = 23
idx = np.s_[13:1000:2,14:20]
for f in (no_check,use_reshape):
    a = np.zeros((1000,1000))
    print(f.__name__)
    print(timeit(lambda:f(a,idx,val),number=1000),'ms')
    assert (a[idx] == val).all()
    
# check it works
print("\nThis should raise an exception:\n")
use_reshape(a,np.s_[1000:1001,10],0)

Please note, that this is proof of concept code. To make it safe you'd have to check for unexpected index kinds, matching numbers of dimensions and, importantly, check for indices that select a single element.

Running it anyway:

no_check
0.004587646995787509 ms
use_reshape
0.006306983006652445 ms

This should raise an exception:

Traceback (most recent call last):
  File "check.py", line 22, in <module>
    use_reshape(a,np.s_[1000:1001,10],0)
  File "check.py", line 7, in use_reshape
    a[idx].reshape(*expected_shape)[...] = val
ValueError: cannot reshape array of size 0 into shape (1,1)

Upvotes: 1

user1635327
user1635327

Reputation: 1641

One way to achieve the behavior you want is to use ranges instead of slices:

a = np.array((1, 2, 3))
a[np.arange(13, 17)] = 23

I think NumPy's behavior here is consistent with the behavior of pure Python's lists and should be expected. Instead of workarounds, it might be better for code readability to explicitly add asserts:

index_1, index_2 = ... # a complex computation
assert index_1 < index_2 and index_2 < a.shape[0]
a[index_1:index_2] = 23

Upvotes: 1

Ivan
Ivan

Reputation: 40728

You could be using np.put_along_axis instead, which seems to fit your needs:

>>> a = np.array((1, 2, 3))
>>> np.put_along_axis(a, indices=np.arange(13, 17), axis=0, values=23)

The above will raise the following error:

IndexError: index 13 is out of bounds for axis 0 with size 3

Parameter values can either be a scalar value or another NumPy array.

Or in a shorter form:

>>> np.put_along_axis(a, np.r_[13:17], 23, 0)

Edit: Alternatively np.put has a mode='raise' option (which is set by default):

np.put(a, ind, v, mode='raise')

  • a: ndarray - Target array.

  • ind: array_like - Target indices, interpreted as integers.

  • v: array_like - Values to place in a at target indices. [...]

  • mode: {'raise', 'wrap', 'clip'} optional - Specifies how out-of-bounds indices will behave.

    • 'raise' – raise an error (default)
    • 'wrap' – wrap around
    • 'clip' – clip to the range

The default behavior will be:

>>> np.put(a, np.r_[13:17], 23)

IndexError: index 13 is out of bounds for axis 0 with size 3

while with mode='clip', it remains silent:

 >>> np.put(a, np.r_[13:17], 23, mode='clip')

Upvotes: 2

Related Questions