Indexing ranges of columns of array when only the indexes of the ranges are given

Question

I am looking for an efficient way of indexing the columns of a numpy array with several ranges, when only the indexes of the desired ranges are given.

For example, given the following array, and a range size r_size=3:

import numpy as np
arr = np.arange(18).reshape((2,9))

array([[ 0,  1,  2,  3,  4,  5,  6,  7,  8],
       [ 9, 10, 11, 12, 13, 14, 15, 16, 17]])

This would mean that there are a total of 3 sets of ranges [r0, r1, r2] whose elements in the array are distributed as:

[[r0_00, r0_01, r0_02, r1_00, r1_01, r1_02, r2_00, r2_01, r2_02]
 [r0_10, r0_11, r0_12, r1_10, r1_11, r1_12, r2_10, r2_11, r2_12]]

So if I want to access the ranges r0 and r2, then I would obtain:

arr    = np.arange(18).reshape((2,9))
r_size = 3
ranges = [0, 2]
# --------------------------------------------------------
# Line that index arr, with the variable ranges... Output:
# --------------------------------------------------------
array([[ 0,  1,  2,  6,  7,  8],
       [ 9, 10, 11, 15, 16, 17]])

The fastest way that I've found is the following:

import numpy as np
from itertools import chain

arr    = np.arange(18).reshape((2,9))
r_size = 3
ranges = [0,2]

arr[:, list(chain(*[range(r_size*x,r_size*x+r_size) for x in ranges]))]

array([[ 0,  1,  2,  6,  7,  8],
       [ 9, 10, 11, 15, 16, 17]])

But I am not sure if it can be improved in terms of speed.

Thanks in advance!

Ivan · Accepted Answer

You could start by splitting the array up in r_size chunks:

>>> splits = np.split(arr, r_size, axis=1)
[array([[ 0,  1,  2],
        [ 9, 10, 11]]), 
 array([[ 3,  4,  5],
        [12, 13, 14]]), 
 array([[ 6,  7,  8],
        [15, 16, 17]])]

Stack with np.stack and select the correct ranges:

>>> stack = np.stack(splits)[ranges]
array([[[ 0,  1,  2],
        [ 9, 10, 11]],

       [[ 6,  7,  8],
        [15, 16, 17]]])

And concatenate horizontally with np.hstack or np.concantenate on axis=1:

>>> np.stack(stack)
array([[ 0,  1,  2,  6,  7,  8],
       [ 9, 10, 11, 15, 16, 17]])

Overall this looks like:

>>> np.hstack(np.stack(np.split(arr, r_size, axis=1))[ranges])
array([[ 0,  1,  2,  6,  7,  8],
       [ 9, 10, 11, 15, 16, 17]])

Alternatively, you can work with np.reshapes exclusively which will be faster:

Initial reshape:

>>> arr.reshape(len(arr), -1, r_size)
array([[[ 0,  1,  2],
        [ 3,  4,  5],
        [ 6,  7,  8]],

       [[ 9, 10, 11],
        [12, 13, 14],
        [15, 16, 17]]])

Indexing with ranges:

>>> arr.reshape(len(arr), -1, r_size)[:, ranges]
array([[[ 0,  1,  2],
        [ 6,  7,  8]],

       [[ 9, 10, 11],
        [15, 16, 17]]])

Then, reshaping back to the final form:

>>> arr.reshape(len(arr),  -1, r_size)[:, ranges].reshape(len(arr), -1)

Indexing ranges of columns of array when only the indexes of the ranges are given

Answers (2)

Related Questions