Reputation: 1194
I'm trying to write a method that takes a string, for example a DNA string and outputs the number with the sub string and preserves the sequence.
For example:
>>dna = AABBBGGGKKDDDD
>>substring(dna) #some method
>>2A3B3G2K4D
I'm guessing I can have an empty array, and then create a for loop that iterates through each and every letter and if it's the same letter, it does a count and then adds the letter in the end. I'm just not sure how to syntactically write it out. Any help would be appreciated :)
Upvotes: 1
Views: 55
Reputation: 26315
itertools.groupby()
works perfectly for this task:
from itertools import groupby
def get_sequence(dna):
return ''.join(str(len(tuple(g))) + k for k, g in groupby(dna))
print(get_sequence('AABBBGGGKKDDDD'))
# 2A3B3G2K4D
Upvotes: 2
Reputation: 14581
Here is a quick example.
dna = 'AABBBGGGKKDDDD'
def get_sequence(dna):
sequence = ''
previous_c = ''
count = 0
for c in dna:
if c == previous_c:
count += 1
else:
if len(previous_c) > 0:
sequence += '{}{}'.format(count, previous_c)
count = 1
previous_c = c
if count > 0:
sequence += '{}{}'.format(count, previous_c)
return sequence
print(get_sequence('A'))
print(get_sequence(''))
print(get_sequence(dna))
Output:
1A
2A3B3G2K4D
Upvotes: 1