mlenthusiast
mlenthusiast

Reputation: 1194

Find unique sub-strings and preserve sequence

I'm trying to write a method that takes a string, for example a DNA string and outputs the number with the sub string and preserves the sequence.

For example:

>>dna = AABBBGGGKKDDDD
>>substring(dna) #some method
>>2A3B3G2K4D

I'm guessing I can have an empty array, and then create a for loop that iterates through each and every letter and if it's the same letter, it does a count and then adds the letter in the end. I'm just not sure how to syntactically write it out. Any help would be appreciated :)

Upvotes: 1

Views: 55

Answers (2)

RoadRunner
RoadRunner

Reputation: 26315

itertools.groupby() works perfectly for this task:

from itertools import groupby

def get_sequence(dna):
    return ''.join(str(len(tuple(g))) + k for k, g in groupby(dna))

print(get_sequence('AABBBGGGKKDDDD'))
# 2A3B3G2K4D

Upvotes: 2

Martlark
Martlark

Reputation: 14581

Here is a quick example.

dna = 'AABBBGGGKKDDDD'


def get_sequence(dna):
    sequence = ''
    previous_c = ''
    count = 0
    for c in dna:
        if c == previous_c:
            count += 1
        else:
            if len(previous_c) > 0:
                sequence += '{}{}'.format(count, previous_c)
            count = 1
            previous_c = c
    if count > 0:
        sequence += '{}{}'.format(count, previous_c)
    return sequence


print(get_sequence('A'))
print(get_sequence(''))
print(get_sequence(dna))

Output:

1A

2A3B3G2K4D

Upvotes: 1

Related Questions