Reputation: 2573

combining two slicing operations

Is there a smart and easy way to combine two slicing operations into one?

Say I have something like

arange(1000)[::2][10:20]
>>> array([20, 22, 24, 26, 28, 30, 32, 34, 36, 38])

Of course in this example this is not a problem, but if the arrays are very large I would very much like to avoid creating the intermediate array (or is there none?). I believe it should be possible to combine the two slices but maybe I'm overseeing something. So the idea would be something like:

arange(1000)[ slice(None,None,2) + slice(10,20,None) ]

This of course does not work but is what i would like to do. Is there anything that does combine slicing objects? (despite my efforts I did not find anything).

Upvotes: 11

Answers (6)

Alister Trabattoni

Reputation: 11

Here an imporved version of @dlitz code that passes @well tests.

def combine_slices(length, *slices):
    r = range(length)
    for s in slices:
        r = r[s]
    if len(r) == 0:
        return slice(0)
    elif r.stop < 0:
        return slice(r.start, None, r.step)
    else:
        return slice(r.start, r.stop, r.step)

Upvotes: 1

dlitz

Reputation: 6187

In Python 3, the built-in range object can do the calculation for you without expanding to fill memory:

def combine_slices(length, *slices):
    r = range(length)   # length of array being sliced
    for s in slices:
        r = r[s]
    return slice(r.start, r.stop, r.step)

arr = range(-2**48, 2**48)   # simulate a huge array
s = combine_slices(len(arr), slice(2**48,None), slice(None,None,2), slice(10,20,None))

print(arr[s] == arr[2**48:][::2][10:20])    # => True
print(list(arr[s]))     # => [20, 22, 24, 26, 28, 30, 32, 34, 36, 38]
print(s)    # => slice(281474976710676, 281474976710696, 2)

Upvotes: 1

well

Reputation: 667

As @Tigran said, slicing costs nothing when using Numpy arrays. However, in general we can combine two slices in series using info from slice.indices, which

Retrieve[s] the start, stop, and step indices from the slice object slice assuming a sequence of length length

We can reduce

x[slice1][slice2]

x[combined]

The first slicing returns a new object, which is then sliced by the second slicing. So, we'll also need the length of our data object to combine the slices properly. (Length in the first dimension)

So, we can write

def slice_combine(slice1, slice2, length):
    """
    returns a slice that is a combination of the two slices.
    As in 
      x[slice1][slice2]
    becomes
      combined_slice = slice_combine(slice1, slice2, len(x))
      x[combined_slice]

    :param slice1: The first slice
    :param slice2: The second slice
    :param length: The length of the first dimension of data being sliced. (eg len(x))
    """

    # First get the step sizes of the two slices.
    slice1_step = (slice1.step if slice1.step is not None else 1)
    slice2_step = (slice2.step if slice2.step is not None else 1)

    # The final step size
    step = slice1_step * slice2_step

    # Use slice1.indices to get the actual indices returned from slicing with slice1
    slice1_indices = slice1.indices(length)

    # We calculate the length of the first slice
    slice1_length = (abs(slice1_indices[1] - slice1_indices[0]) - 1) // abs(slice1_indices[2])

    # If we step in the same direction as the start,stop, we get at least one datapoint
    if (slice1_indices[1] - slice1_indices[0]) * slice1_step > 0:
        slice1_length += 1
    else:
        # Otherwise, The slice is zero length.
        return slice(0,0,step)

    # Use the length after the first slice to get the indices returned from a
    # second slice starting at 0.
    slice2_indices = slice2.indices(slice1_length)

    # if the final range length = 0, return
    if not (slice2_indices[1] - slice2_indices[0]) * slice2_step > 0:
        return slice(0,0,step)

    # We shift slice2_indices by the starting index in slice1 and the 
    # step size of slice1
    start = slice1_indices[0] + slice2_indices[0] * slice1_step
    stop = slice1_indices[0] + slice2_indices[1] * slice1_step

    # slice.indices will return -1 as the stop index when slice.stop should be set to None.
    if start > stop:
        if stop < 0:
            stop = None

    return slice(start, stop, step)

Then, let's run some tests

import sys
import numpy as np

# Make a 1D dataset
x = np.arange(100)
l = len(x)

# Make a (100, 10) dataset
x2 = np.arange(1000)
x2 = x2.reshape((100,10))
l2 = len(x2)

# Test indices and steps
indices = [None, -1000, -100, -99, -50, -10, -1, 0, 1, 10, 50, 99, 100, 1000]
steps = [-1000, -99, -50, -10, -3, -2, -1, 1, 2, 3, 10, 50, 99, 1000]
indices_l = len(indices)
steps_l = len(steps)

count = 0
total = 2 * indices_l**4 * steps_l**2
for i in range(indices_l):
    for j in range(indices_l):
        for k in range(steps_l):
            for q in range(indices_l):
                for r in range(indices_l):
                    for s in range(steps_l):
                        # Print the progress. There are a lot of combinations.
                        if count % 5197 == 0:
                            sys.stdout.write("\rPROGRESS: {0:,}/{1:,} ({2:.0f}%)".format(count, total, float(count) / float(total) * 100))
                            sys.stdout.flush()

                        slice1 = slice(indices[i], indices[j], steps[k])
                        slice2 = slice(indices[q], indices[r], steps[s])

                        combined = slice_combine(slice1, slice2, l)
                        combined2 = slice_combine(slice1, slice2, l2)
                        np.testing.assert_array_equal(x[slice1][slice2], x[combined], 
                            err_msg="For 1D, slice1: {0},\tslice2: {1},\tcombined: {2}\tCOUNT: {3}".format(slice1, slice2, combined, count))
                        np.testing.assert_array_equal(x2[slice1][slice2], x2[combined2], 
                            err_msg="For 2D, slice1: {0},\tslice2: {1},\tcombined: {2}\tCOUNT: {3}".format(slice1, slice2, combined2, count))

                        # 2 tests per loop
                        count += 2

print("\n-----------------")
print("All {0:,} tests passed!".format(count))

And thankfully we get

All 15,059,072 tests passed!

Upvotes: 1

Tigran Saluev

Reputation: 3582

You can subclass slice to make such superposition of slices possible. Just override __add__ (or __mul__ - a mathematician would surely prefer * notation for superposition). But it is going to invoke some math. By the way, you could make a nice Python package with this stuff ;-)
As bheklilr said, slicing costs nothing in NumPy. So you can just go on with a simple solution like list of slices.

P. S. In general, multiple slicing can be used to make code nicer and much more clear. Even a simple choice between one of the following lines:

v = A[::2][10:20]
v = A[20:40][::2]
v = A[20:40:2]

can deeply reflect program logic, making code self-documenting.

One more example: if you have a flat NumPy array and you wish to extract a subarray in position position of length length, you can do

v = A[position : position + length]

v = A[position:][:length]

Decide for yourself which option looks better. ;-)

Upvotes: 5

Corley Brigman

Reputation: 12401

you can use islice, which probably won't be any faster, but will avoid the intermediate entries by working as a generator:

arange = range(1000)

from itertools import islice
islice(islice(arange, None, None, 2), 10, 20)

%timeit list(islice(islice(arange, None, None, 2), 10, 20))
100000 loops, best of 3: 2 us per loop

%timeit arange[::2][10:20]
100000 loops, best of 3: 2.64 us per loop

So, a little faster.

Upvotes: 0

prgao

Reputation: 1787

very simple:

arange(1000)[20:40:2]

should do

Upvotes: -1

combining two slicing operations

Answers (6)

Related Questions