max
max

Reputation: 52235

Reverse slice from end of list to a specific index

Let's say I need a slice from the end of a sequence seq to the first occurrence of a given item x (inclusive). The naive attempt to write seq[-1:seq.index(x)-1:-1] creates a subtle bug:

seq = 'abc'
seq[-1:seq.index('b')-1:-1]  # 'cb' as expected
seq[-1:seq.index('a')-1:-1]  # '' because -1 is interpreted as end of seq

Is there any idiomatic way to write this?

seq[seq.index(x):][::-1] works fine, but it is presumably inefficient for large sequences since it creates an extra copy. (I do need a sequence in the end, so one copy is necessary; I just don't want to create a second copy.)

On a side note, this is a really easy bug to introduce, it can pass many tests, and is undetectable to any static analyzer (unless it warns about every slice with a negative step).

Update

There seems to be no perfect / idiomatic solution. I agree that it may not be the bottleneck as often as I thought, so I'll use [pos:][::-1] in most cases. When performance is important, I'd use the normal if check. However, I'll accept the solution that I found interesting even though it's hard to read; it's probably usable in certain rare cases (where I really need to fit the whole thing into an expression, and I don't want to define a new function).

Also, I tried timing this. For lists it seems there's always a 2x penalty for an extra slice even if they are as short as 2 items. For strings, the results are extremely inconsistent, to the point that I can't say anything:

import timeit
for n in (2, 5, 10, 100, 1000, 10000, 100000, 1000000):
    c = list(range(n))
    # c = 'x' * n
    pos = n // 2 # pretend the item was found in the middle
    exprs = 'c[pos:][::-1]', 'c[:pos:-1] if pos else c[::-1]'
    results = [timeit.Timer(expr, globals=globals()).autorange() for expr in exprs]
    times = [t/loops for loops, t in results]
    print(n, times[0]/times[1])

Results for lists (ratio of extra slice / no extra slice times):

2 2.667782437753884
5 2.2672817613246914
10 1.4275235266754878
100 1.6167102119737584
1000 1.7309116253903338
10000 3.606259720606781
100000 2.636049703318956
1000000 1.9915776615090277

Of course, this ignores the fact that whatever it is we're doing with the resulting slice is much more costly, in relative terms, when the slice is short. So still, I agree that for sequences of small size, [::-1] is usually perfectly fine.

Upvotes: 4

Views: 1522

Answers (3)

PM 2Ring
PM 2Ring

Reputation: 55469

IMHO, seq[seq.index(x):][::-1] is the most readable solution, but here's a way that's a little more efficient.

def sliceback(seq, key):
    pos = seq.index(key)
    return seq[:pos-1 if pos else None:-1]

seq = 'abc'
for k in seq:
    print(k, sliceback(seq, k)) 

output

a cba
b cb
c c

As Budo Zindovic mentions in the comments, .index will raise an exception if the char isn't found in the string. Depending on the context, the code may not ever be called with a char that's not in seq, but if it's possible we need to handle it. The simplest way to do that is to catch the exception:

def sliceback(seq, key):
    try:
        pos = seq.index(key)
    except ValueError:
        return ''
    return seq[:pos-1 if pos else None:-1]

seq = 'abc'
for k in 'abcd':
    print(k, sliceback(seq, k)) 

output

a cba
b cb
c c
d 

Python exception handling is very efficient. When the exception isn't actually raised it's faster than equivalent if-based code, but if the exception is raised more than 5-10% of the time it's faster to use an if.

Rather than testing for key before calling seq.index, it's more efficient to use find. Of course, that will only work if seq is a string; it won't work if seq is a list because (annoyingly) lists don't have a .find method.

def sliceback(seq, key):
    pos = seq.find(key)
    return '' if pos < 0 else seq[:pos-1 if pos else None:-1]

Upvotes: 3

user2357112
user2357112

Reputation: 280345

If an iterator result is okay, use a forward slice and call reversed on it:

reversed(seq[seq.index(whatever):])

If it isn't, subtract an extra len(seq) from the endpoint:

seq[:seq.index(whatever)-len(seq)-1:-1]

Or just take a forward slice, slice it again to reverse it, and eat the cost of the extra copy. It's probably not your bottleneck.

Whatever you do, leave a comment explaining it so people don't reintroduce the bug when editing, and write a unit test for this case.

Upvotes: 3

Mohd
Mohd

Reputation: 5613

You can check for pos while assigning the string, for example:

result = seq[-1:pos-1:-1] if pos > 0 else seq[::-1]

input:

pos = seq.index('a')

output:

cba

Upvotes: 0

Related Questions