eretmochelys
eretmochelys

Reputation: 613

get view of numpy array using boolean or sequence object (advanced indexing)

How does one return a view (not a copy) of a numpy array via either boolean or a tuple of ints as the index?

The trouble is this typically returns a copy:

Advanced indexing is triggered when the selection object, obj, is a non-tuple sequence object, an ndarray (of data type integer or bool), or a tuple with at least one sequence object or ndarray (of data type integer or bool). There are two types of advanced indexing: integer and Boolean.

Advanced indexing always returns a copy of the data (contrast with basic slicing that returns a view).

My motivation for doing so is to save on memory. Here is a quick example of the problem:

import numpy as np

big_number = 10
x = np.ones((big_number, big_number, big_number))

#
sub_array = np.s_[(1, 2, 3, 5, 7), :, :]
y = x[sub_array]
print(y.flags['OWNDATA'])

True

In general, there isn't any structure to the tuple of indices (1, 2, 3, 5, 7), so I'm stumped as to how to massage it into the regular strides needed for basic numpy indexing

Upvotes: 2

Views: 695

Answers (2)

hpaulj
hpaulj

Reputation: 231325

One way to visualize whether two arrays can share memory is to look at their 'ravel'

In [422]: x = np.arange(24).reshape((4,3,2))
In [423]: x
Out[423]: 
array([[[ 0,  1],
        [ 2,  3],
        [ 4,  5]],

       [[ 6,  7],
        [ 8,  9],
        [10, 11]],

       [[12, 13],
        [14, 15],
        [16, 17]],

       [[18, 19],
        [20, 21],
        [22, 23]]])
In [424]: y = x[[1,3,0,2],:,:]  # rearrange the 1st axis
In [425]: y
Out[425]: 
array([[[ 6,  7],
        [ 8,  9],
        [10, 11]],

       [[18, 19],
        [20, 21],
        [22, 23]],

       [[ 0,  1],
        [ 2,  3],
        [ 4,  5]],

       [[12, 13],
        [14, 15],
        [16, 17]]])

In [428]: x.ravel(order='K')
Out[428]: 
array([ 0,  1,  2,  3,  4,  5,  6,  7,  8,  9, 10, 11, 12, 13, 14, 15, 16,
       17, 18, 19, 20, 21, 22, 23])
In [429]: y.ravel(order='K')
Out[429]: 
array([ 6,  7,  8,  9, 10, 11, 18, 19, 20, 21, 22, 23,  0,  1,  2,  3,  4,
        5, 12, 13, 14, 15, 16, 17])

Notice how the elements in y occur in a different order. There's no way that we can 'stride' through x to get y.

With out the order parameter, ravel use 'C', which can confuse us when the new array does some sort of axis transpose. As noted in the other answer x.T is a view, achieved by reordering the axes, and hence changing the strides.

In [430]: x.T.ravel() # transposed array viewed row by row Out[430]: array([ 0, 6, 12, 18, 2, 8, 14, 20, 4, 10, 16, 22, 1, 7, 13, 19, 3, 9, 15, 21, 5, 11, 17, 23]) In [431]: x.T.ravel(order='K') # transposed array viewed column by column Out[431]: array([ 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23])

__array_interface__ is a handy tool for looking at the underlying structure of an array:

In [432]: x.__array_interface__
Out[432]: 
{'data': (45848336, False),
 'strides': None,
 'descr': [('', '<i8')],
 'typestr': '<i8',
 'shape': (4, 3, 2),
 'version': 3}
In [433]: y.__array_interface__
Out[433]: 
{'data': (45892944, False),
 'strides': None,
 'descr': [('', '<i8')],
 'typestr': '<i8',
 'shape': (4, 3, 2),
 'version': 3}
In [434]: x.T.__array_interface__
Out[434]: 
{'data': (45848336, False),     # same as for x
 'strides': (8, 16, 48),        # reordered strides
 'descr': [('', '<i8')],
 'typestr': '<i8',
 'shape': (2, 3, 4),
 'version': 3}

Upvotes: 1

macroeconomist
macroeconomist

Reputation: 701

Views in NumPy are based on looking at the same data in memory, just with some mix of a different starting point and "strides" that must be traversed for each dimension. (Strides tell us the number of bytes we need to move in the array when we increase the index by one in each dimension.)

If the array you want can be expressed in this way given the original array, then it should be easy enough to construct it as a view of the original array. For instance, in your comment you mention shuffling the order of the axes; that's just a call to np.transpose and should give you a view. Generically, though, fancy indexing won't give you a subarray of the right form, which is why NumPy doesn't return a view from it. (It's not "smart" enough to identify those special cases where a view would be possible--you have to do that manually.)

Some examples:

In [1]: import numpy as np
   ...: x = np.empty((20,30,5))
   ...: x.strides

Out[1]: (1200, 40, 8)

In [2]: y = x.transpose((1,2,0))
   ...: y.strides

Out[2]: (40, 8, 1200)

In [3]: y.flags['OWNDATA']
Out[3]: False

In [4]: z = x[12:1:-2, 1:25:4, :]
   ...: z.strides

Out[4]: (-2400, 160, 8)

In [5]: z.flags['OWNDATA']
Out[5]: False

Permuting the axes of x using transpose to get y just permuted the strides. A fairly complex standard indexing to get z also changed the strides (the first was multiplied by -2 and the second by 4, because those were the steps).

We can see that both y and z are views because the OWNDATA flag is False.

Upvotes: 0

Related Questions