Reputation: 2221
I want to implement an algorithm that gets the index of letter changes. I have the below list, here I want to find the beginning of every letter changes and put a result list except the first one. Because, for the first one, we should get the last index of occurrence of it. Let me give you an example:
letters=['A','A','A','A','A','A','A','A','A','A','A','A','B','C','C','X','D','X','B','B','A','A','A','A']
Transitions:
'A','A','A','A','A','A','A','A','A','A','A','A'-->'B'-->'C','C'-->'X'-->'D'-->'X'-->'B','B'-->'A','A','A','A'
Here, after A letters finish, B starts, we should put the index of last A and the index of first B and so on, but we should not include X letter into the result list.
Desired result:
[(11, 'A'), (12, 'B'), (13, 'C'), (16, 'D'), (18, 'B'), (20, 'A')]
So far, I have done this code, this finds other items except the (11, 'A'). How can I modify my code to get the desired result?
for i in range(len(letters)):
if letters[i]!='X' and letters[i]!=letters[i-1]:
result.append((i,(letters[i])))
My result:
[(12, 'B'), (13, 'C'), (16, 'D'), (18, 'B'), (20, 'A')] ---> missing (11, 'A').
Upvotes: 1
Views: 2665
Reputation: 104712
Here's my suggestion. It has three steps.
The code:
def letter_runs(letters):
prev = None
results = []
for index, letter in enumerate(letters):
if letter != prev:
prev = letter
results.append((index, letter))
if results[0][1] != "X":
results[0] = (results[1][0]-1, results[0][1])
else: # if first run is "X" second must be something else!
results[1] = (results[2][0]-1, results[1][1])
return [(index, letter) for index, letter in results if letter != "X"]
Upvotes: -1
Reputation: 49826
Here's a solution which uses groupby
to generate a single sequence from which both first and last indices can be extracted.
import itertools
import functools
letters = ['A', 'A', 'A', 'A', 'A', 'A', 'A', 'A', 'A', 'A', 'A', 'A', 'B', 'C', 'C', 'X', 'D', 'X', 'B', 'B', 'A', 'A', 'A', 'A']
groupbysecond = functools.partial(itertools.groupby,key=operator.itemgetter(1))
def transitions(letters):
#segregate transition and non-transition indices
grouped = groupbysecond(enumerate(zip(letters,letters[1:])))
# extract first such entry from each group
firsts = (next(l) for k,l in grouped)
# group those entries together - where multiple, there are first and last
# indices of the run of letters
regrouped = groupbysecond((n,a) for n,(a,b) in firsts)
# special case for first entry, which wants last index of first letter
kfirst,lfirst = next(regrouped)
firstitem = (tuple(lfirst)[-1],) if kfirst != 'X' else ()
#return first item, and first index for all other letters
return itertools.chain(firstitem,(next(l) for k,l in regrouped if k != 'X'))
Upvotes: 1
Reputation: 8061
With minimal change to your code, and following Josh Caswell's suggestion:
for i, letter in enumerate(letters[1:], 1):
if letter != 'X' and letters[i] != letters[i-1]:
result.append((i, letter))
first_change = result[0][0]
first_stretch = ''.join(letters[:first_change]).rstrip('X')
if first_stretch:
result.insert(0, (len(first_stretch) - 1, first_stretch[-1]))
Upvotes: 1
Reputation: 49826
Now that you've explained you want the first index of every letter after the first, here's a one-liner:
letters=['A','A','A','A','A','A','A','A','A','A','A','A','B','C','C','X','D','X','B','B','A','A','A','A']
[(n+1, b) for (n, (a,b)) in enumerate(zip(letters,letters[1:])) if a!=b and b!='X']
#=> [(12, 'B'), (13, 'C'), (16, 'D'), (18, 'B'), (20, 'A')]
Now, your first entry is different. For this, you need to use a recipe which finds the last index of each item:
import itertools
grouped = [(len(list(g))-1,k) for k,g in (itertools.groupby(letters))]
weird_transitions = [grouped[0]] + [(n+1, b) for (n, (a,b)) in enumerate(zip(letters,letters[1:])) if a!=b and b!='X']
#=> [(11, 'A'), (12, 'B'), (13, 'C'), (16, 'D'), (18, 'B'), (20, 'A')]
Of course, you could avoid creating the whole list of grouped
, because you only ever use the first item from groupby. I leave that as an exercise for the reader.
This will also give you an X as the first item, if X is the first (set of) items. Because you say nothing about what you're doing, or why the Xs are there, but omitted, I can't figure out if that's the right behaviour or not. If it's not, then probably use my entire other recipe (in my other answer), and then take the first item from that.
Upvotes: 3
Reputation: 49826
You want (Or, you don't, as you finally explained - see my other answer):
import itertools
import functional # get it from pypi
letters=['A','A','A','A','A','A','A','A','A','A','A','A','B','C','C','X','D','X','B','B','A','A','A','A']
grouped = [(len(list(g)),k) for k,g in (itertools.groupby(letters))]
#=> [(12, 'A'), (1, 'B'), (2, 'C'), (1, 'D'), (2, 'B'), (4, 'A')]
#-1 to take this from counts to indices
filter(lambda (a,b): b!='X',functional.scanl(lambda (a,b),(c,d): (a+c,d), (-1,'X'), grouped))
#=> [(11, 'A'), (12, 'B'), (14, 'C'), (16, 'D'), (19, 'B'), (23, 'A')]
This gives you the last index of each letter run, other than Xs. If you want the first index after the relevant letter, then switch the -1 to 0.
scanl
is a reduce which returns intermediate results.
As a general rule, it makes sense to either filter first or last, unless that is for some reason expensive, or the filtering can easily be accomplished without increasing complexity.
Also, your code is relatively hard to read and understand, because you iterate by index. That's unusual in python, unless manipulating the index numerically. If you're visiting every item, it's usual to iterate directly.
Also, why do you want this particular format? It's usual to have the format as (unique item,data)
because that can easily be placed in a dict
.
Upvotes: 2
Reputation: 71
Your question is a bit confusing, but this code should do what you want.
firstChangeFound = False
for i in range(len(letters)):
if letters[i]!='X' and letters[i]!=letters[i-1]:
if not firstChangeFound:
result.append((i-1, letters[i-1])) #Grab the last occurrence of the first character
result.append((i, letters[i]))
firstChangeFound = True
else:
result.append((i, letters[i]))
Upvotes: 2
Reputation: 9237
With an aid of dictionary to keep running time linear in number of input, here is a solution:
letters=['A','A','A','A','A','A','A','A','A','A','A','A','B','C','C','X','D','X','B','B','A','A','A','A']
def f(letters):
result = []
added = {}
for i in range(len(letters)):
if (i+1 == len(letters)):
break
if letters[i+1]!='X' and letters[i+1]!=letters[i]:
if(i not in added and letters[i]!='X'):
result.append((i, letters[i]))
added[i] = letters[i]
if(i+1 not in added):
result.append((i+1, letters[i+1]))
added[i+1] = letters[i+1]
return result
Basically, my the solution always tries to add both indices where a change occurred. But the dictionary (which has constant time lookup tells us if we already added the element or not to exclude duplicates). This takes care of adding the first element. Otherwise you can use an if statement to indicate first round which will only run once. However, I argue that this solution has same running time. As long as you do not check if you added the element by looking up the list itself (since this is linear time lookup at worst), this will result in O(n^2) time which is bad!
Upvotes: 0
Reputation: 2828
letters=['A','A','A','A','A','A','A','A','A','A','A','A','B','C','C','X','D','X','B','B','A','A','A','A']
# 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23
prev = letters[0]
result = []
for i in range(len(letters)):
if prev!=letters[i]:
result.append((i-1,prev))
if letters[i]!='X':
prev = letters[i]
else:
prev = letters[i+1]
result.append((len(letters)-1,letters[-1]))
print result
RESULTS IN: (Not OP's desired results, sorry I must have misunderstood. see JSutton's ans)
[(11,'A'), (12,'B'), (14,'C'), (16,'D'), (19,'B'), (23,'A')]
which is actually the index of the last instance of a letter before they change or the list ends.
Upvotes: 0