Reputation: 2573
Is there a smart and easy way to combine two slicing operations into one?
Say I have something like
>>> array([20, 22, 24, 26, 28, 30, 32, 34, 36, 38])
Of course in this example this is not a problem, but if the arrays are very large I would very much like to avoid creating the intermediate array (or is there none?). I believe it should be possible to combine the two slices but maybe I'm overseeing something. So the idea would be something like:
arange(1000)[ slice(None,None,2) + slice(10,20,None) ]
This of course does not work but is what i would like to do. Is there anything that does combine slicing objects? (despite my efforts I did not find anything).
Upvotes: 11
Views: 12372
Reputation: 11
Here an imporved version of @dlitz code that passes @well tests.
def combine_slices(length, *slices):
r = range(length)
for s in slices:
r = r[s]
if len(r) == 0:
return slice(0)
elif r.stop < 0:
return slice(r.start, None, r.step)
return slice(r.start, r.stop, r.step)
Upvotes: 1
Reputation: 6187
In Python 3, the built-in range object can do the calculation for you without expanding to fill memory:
def combine_slices(length, *slices):
r = range(length) # length of array being sliced
for s in slices:
r = r[s]
return slice(r.start, r.stop, r.step)
arr = range(-2**48, 2**48) # simulate a huge array
s = combine_slices(len(arr), slice(2**48,None), slice(None,None,2), slice(10,20,None))
print(arr[s] == arr[2**48:][::2][10:20]) # => True
print(list(arr[s])) # => [20, 22, 24, 26, 28, 30, 32, 34, 36, 38]
print(s) # => slice(281474976710676, 281474976710696, 2)
Upvotes: 1
Reputation: 667
As @Tigran said, slicing costs nothing when using Numpy arrays. However, in general we can combine two slices in series using info from slice.indices, which
Retrieve[s] the start, stop, and step indices from the slice object slice assuming a sequence of length length
We can reduce
The first slicing returns a new object, which is then sliced by the second slicing. So, we'll also need the length of our data object to combine the slices properly. (Length in the first dimension)
So, we can write
def slice_combine(slice1, slice2, length):
returns a slice that is a combination of the two slices.
As in
combined_slice = slice_combine(slice1, slice2, len(x))
:param slice1: The first slice
:param slice2: The second slice
:param length: The length of the first dimension of data being sliced. (eg len(x))
# First get the step sizes of the two slices.
slice1_step = (slice1.step if slice1.step is not None else 1)
slice2_step = (slice2.step if slice2.step is not None else 1)
# The final step size
step = slice1_step * slice2_step
# Use slice1.indices to get the actual indices returned from slicing with slice1
slice1_indices = slice1.indices(length)
# We calculate the length of the first slice
slice1_length = (abs(slice1_indices[1] - slice1_indices[0]) - 1) // abs(slice1_indices[2])
# If we step in the same direction as the start,stop, we get at least one datapoint
if (slice1_indices[1] - slice1_indices[0]) * slice1_step > 0:
slice1_length += 1
# Otherwise, The slice is zero length.
return slice(0,0,step)
# Use the length after the first slice to get the indices returned from a
# second slice starting at 0.
slice2_indices = slice2.indices(slice1_length)
# if the final range length = 0, return
if not (slice2_indices[1] - slice2_indices[0]) * slice2_step > 0:
return slice(0,0,step)
# We shift slice2_indices by the starting index in slice1 and the
# step size of slice1
start = slice1_indices[0] + slice2_indices[0] * slice1_step
stop = slice1_indices[0] + slice2_indices[1] * slice1_step
# slice.indices will return -1 as the stop index when slice.stop should be set to None.
if start > stop:
if stop < 0:
stop = None
return slice(start, stop, step)
Then, let's run some tests
import sys
import numpy as np
# Make a 1D dataset
x = np.arange(100)
l = len(x)
# Make a (100, 10) dataset
x2 = np.arange(1000)
x2 = x2.reshape((100,10))
l2 = len(x2)
# Test indices and steps
indices = [None, -1000, -100, -99, -50, -10, -1, 0, 1, 10, 50, 99, 100, 1000]
steps = [-1000, -99, -50, -10, -3, -2, -1, 1, 2, 3, 10, 50, 99, 1000]
indices_l = len(indices)
steps_l = len(steps)
count = 0
total = 2 * indices_l**4 * steps_l**2
for i in range(indices_l):
for j in range(indices_l):
for k in range(steps_l):
for q in range(indices_l):
for r in range(indices_l):
for s in range(steps_l):
# Print the progress. There are a lot of combinations.
if count % 5197 == 0:
sys.stdout.write("\rPROGRESS: {0:,}/{1:,} ({2:.0f}%)".format(count, total, float(count) / float(total) * 100))
slice1 = slice(indices[i], indices[j], steps[k])
slice2 = slice(indices[q], indices[r], steps[s])
combined = slice_combine(slice1, slice2, l)
combined2 = slice_combine(slice1, slice2, l2)
np.testing.assert_array_equal(x[slice1][slice2], x[combined],
err_msg="For 1D, slice1: {0},\tslice2: {1},\tcombined: {2}\tCOUNT: {3}".format(slice1, slice2, combined, count))
np.testing.assert_array_equal(x2[slice1][slice2], x2[combined2],
err_msg="For 2D, slice1: {0},\tslice2: {1},\tcombined: {2}\tCOUNT: {3}".format(slice1, slice2, combined2, count))
# 2 tests per loop
count += 2
print("All {0:,} tests passed!".format(count))
And thankfully we get
All 15,059,072 tests passed!
Upvotes: 1
Reputation: 3582
to make such superposition of slices possible. Just override __add__
(or __mul__
- a mathematician would surely prefer *
notation for superposition). But it is going to invoke some math. By the way, you could make a nice Python package with this stuff ;-)P. S. In general, multiple slicing can be used to make code nicer and much more clear. Even a simple choice between one of the following lines:
v = A[::2][10:20]
v = A[20:40][::2]
v = A[20:40:2]
can deeply reflect program logic, making code self-documenting.
One more example: if you have a flat NumPy array and you wish to extract a subarray in position position
of length length
, you can do
v = A[position : position + length]
v = A[position:][:length]
Decide for yourself which option looks better. ;-)
Upvotes: 5
Reputation: 12401
you can use islice
, which probably won't be any faster, but will avoid the intermediate entries by working as a generator:
arange = range(1000)
from itertools import islice
islice(islice(arange, None, None, 2), 10, 20)
%timeit list(islice(islice(arange, None, None, 2), 10, 20))
100000 loops, best of 3: 2 us per loop
%timeit arange[::2][10:20]
100000 loops, best of 3: 2.64 us per loop
So, a little faster.
Upvotes: 0