Reputation: 2221

An algorithm to find transitions in Python

I want to implement an algorithm that gets the index of letter changes. I have the below list, here I want to find the beginning of every letter changes and put a result list except the first one. Because, for the first one, we should get the last index of occurrence of it. Let me give you an example:

letters=['A','A','A','A','A','A','A','A','A','A','A','A','B','C','C','X','D','X','B','B','A','A','A','A']

Transitions:

 'A','A','A','A','A','A','A','A','A','A','A','A'-->'B'-->'C','C'-->'X'-->'D'-->'X'-->'B','B'-->'A','A','A','A'

Here, after A letters finish, B starts, we should put the index of last A and the index of first B and so on, but we should not include X letter into the result list.
Desired result:

  [(11, 'A'), (12, 'B'), (13, 'C'), (16, 'D'), (18, 'B'), (20, 'A')]

So far, I have done this code, this finds other items except the (11, 'A'). How can I modify my code to get the desired result?

for i in range(len(letters)):
    if letters[i]!='X' and letters[i]!=letters[i-1]:
        result.append((i,(letters[i])))

My result:

[(12, 'B'), (13, 'C'), (16, 'D'), (18, 'B'), (20, 'A')] ---> missing (11, 'A').

Upvotes: 1

Answers (8)

Blckknght

Reputation: 104712

Here's my suggestion. It has three steps.

Fist, find all the starting indexes for each run of letters.
Replace the index in the first non-X run with the index of the end of its run, which will be one less than the start of the following run.
Filter out all X runs.

The code:

def letter_runs(letters):
    prev = None
    results = []

    for index, letter in enumerate(letters):
        if letter != prev:
            prev = letter
            results.append((index, letter))

    if results[0][1] != "X":
        results[0] = (results[1][0]-1, results[0][1])
    else: # if first run is "X" second must be something else!
        results[1] = (results[2][0]-1, results[1][1])

    return [(index, letter) for index, letter in results if letter != "X"]

Upvotes: -1

Marcin

Reputation: 49826

Here's a solution which uses groupby to generate a single sequence from which both first and last indices can be extracted.

import itertools
import functools
letters = ['A', 'A', 'A', 'A', 'A', 'A', 'A', 'A', 'A', 'A', 'A', 'A', 'B', 'C', 'C', 'X', 'D', 'X', 'B', 'B', 'A', 'A', 'A', 'A']

groupbysecond = functools.partial(itertools.groupby,key=operator.itemgetter(1))

def transitions(letters):
    #segregate transition and non-transition indices
    grouped = groupbysecond(enumerate(zip(letters,letters[1:])))
    # extract first such entry from each group
    firsts = (next(l) for k,l in grouped)
    # group those entries together - where multiple, there are first and last
    # indices of the run of letters
    regrouped = groupbysecond((n,a) for n,(a,b) in firsts)
    # special case for first entry, which wants last index of first letter
    kfirst,lfirst = next(regrouped)
    firstitem = (tuple(lfirst)[-1],) if kfirst != 'X' else ()
    #return first item, and first index for all other letters
    return itertools.chain(firstitem,(next(l) for k,l in regrouped if k != 'X'))

Upvotes: 1

Paulo Almeida

Reputation: 8061

With minimal change to your code, and following Josh Caswell's suggestion:

for i, letter in enumerate(letters[1:], 1):
    if letter != 'X' and letters[i] != letters[i-1]:
        result.append((i, letter))
first_change = result[0][0]
first_stretch = ''.join(letters[:first_change]).rstrip('X')
if first_stretch:
    result.insert(0, (len(first_stretch) - 1, first_stretch[-1]))

Upvotes: 1

Marcin

Reputation: 49826

Now that you've explained you want the first index of every letter after the first, here's a one-liner:

letters=['A','A','A','A','A','A','A','A','A','A','A','A','B','C','C','X','D','X','B','B','A','A','A','A']
[(n+1, b) for (n, (a,b)) in enumerate(zip(letters,letters[1:])) if a!=b and b!='X']
#=> [(12, 'B'), (13, 'C'), (16, 'D'), (18, 'B'), (20, 'A')]

Now, your first entry is different. For this, you need to use a recipe which finds the last index of each item:

import itertools
grouped = [(len(list(g))-1,k) for k,g in (itertools.groupby(letters))]
weird_transitions = [grouped[0]] + [(n+1, b) for (n, (a,b)) in enumerate(zip(letters,letters[1:])) if a!=b and b!='X']
#=> [(11, 'A'), (12, 'B'), (13, 'C'), (16, 'D'), (18, 'B'), (20, 'A')]

Of course, you could avoid creating the whole list of grouped, because you only ever use the first item from groupby. I leave that as an exercise for the reader.

This will also give you an X as the first item, if X is the first (set of) items. Because you say nothing about what you're doing, or why the Xs are there, but omitted, I can't figure out if that's the right behaviour or not. If it's not, then probably use my entire other recipe (in my other answer), and then take the first item from that.

Upvotes: 3

Marcin

Reputation: 49826

You want (Or, you don't, as you finally explained - see my other answer):

import itertools
import functional # get it from pypi
letters=['A','A','A','A','A','A','A','A','A','A','A','A','B','C','C','X','D','X','B','B','A','A','A','A']
grouped = [(len(list(g)),k) for k,g in (itertools.groupby(letters))]
#=> [(12, 'A'), (1, 'B'), (2, 'C'), (1, 'D'), (2, 'B'), (4, 'A')]
#-1 to take this from counts to indices
filter(lambda (a,b): b!='X',functional.scanl(lambda (a,b),(c,d): (a+c,d), (-1,'X'), grouped))
#=> [(11, 'A'), (12, 'B'), (14, 'C'), (16, 'D'), (19, 'B'), (23, 'A')]

This gives you the last index of each letter run, other than Xs. If you want the first index after the relevant letter, then switch the -1 to 0.

scanl is a reduce which returns intermediate results.

As a general rule, it makes sense to either filter first or last, unless that is for some reason expensive, or the filtering can easily be accomplished without increasing complexity.

Also, your code is relatively hard to read and understand, because you iterate by index. That's unusual in python, unless manipulating the index numerically. If you're visiting every item, it's usual to iterate directly.

Also, why do you want this particular format? It's usual to have the format as (unique item,data) because that can easily be placed in a dict.

Upvotes: 2

JSutton

Reputation: 71

Your question is a bit confusing, but this code should do what you want.

firstChangeFound = False
for i in range(len(letters)):
    if letters[i]!='X' and letters[i]!=letters[i-1]:
        if not firstChangeFound:
            result.append((i-1, letters[i-1])) #Grab the last occurrence of the first character
            result.append((i, letters[i]))
            firstChangeFound = True
        else:
             result.append((i, letters[i]))

Upvotes: 2

Saher Ahwal

Reputation: 9237

With an aid of dictionary to keep running time linear in number of input, here is a solution:

letters=['A','A','A','A','A','A','A','A','A','A','A','A','B','C','C','X','D','X','B','B','A','A','A','A']

def f(letters):
    result = []
    added = {}
    for i in range(len(letters)):
        if (i+1 == len(letters)):            
            break            
        if letters[i+1]!='X' and letters[i+1]!=letters[i]:
            if(i not in added and letters[i]!='X'):
                result.append((i, letters[i]))
                added[i] = letters[i]
            if(i+1 not in added):
                result.append((i+1, letters[i+1]))
                added[i+1] = letters[i+1]
    return result

Basically, my the solution always tries to add both indices where a change occurred. But the dictionary (which has constant time lookup tells us if we already added the element or not to exclude duplicates). This takes care of adding the first element. Otherwise you can use an if statement to indicate first round which will only run once. However, I argue that this solution has same running time. As long as you do not check if you added the element by looking up the list itself (since this is linear time lookup at worst), this will result in O(n^2) time which is bad!

Upvotes: 0

sihrc

Reputation: 2828

letters=['A','A','A','A','A','A','A','A','A','A','A','A','B','C','C','X','D','X','B','B','A','A','A','A']
    #     0   1   2   3   4   5   6   7   8   9   10  11  12  13  14  15  16  17  18  19  20  21  22  23 
prev = letters[0]
result = []
for i in range(len(letters)):
    if prev!=letters[i]:
        result.append((i-1,prev))
    if letters[i]!='X':
        prev = letters[i]
    else:
        prev = letters[i+1]

result.append((len(letters)-1,letters[-1]))
print result

RESULTS IN: (Not OP's desired results, sorry I must have misunderstood. see JSutton's ans)

[(11,'A'), (12,'B'), (14,'C'), (16,'D'), (19,'B'), (23,'A')]

which is actually the index of the last instance of a letter before they change or the list ends.

Upvotes: 0

An algorithm to find transitions in Python

Answers (8)

Related Questions