sam
sam

Reputation: 655

Encode the DNA string in such a way that similar subsequent characters are grouped into number of occurrences along with the char

I need help in writing the Python code which would return the following output_string as mentioned below in the examples.

Example 1:

input_string = "AAABCCCCDDA"
output_string = "3AB4C2DA"

Example 2:

input_string = "ABBBBCCDDDDAAAAA"
output_string = "A4B2C4D5A"

Upvotes: 0

Views: 36

Answers (2)

SergFSM
SergFSM

Reputation: 1491

it seems like regex also can do the trick:

from re import sub

dna = "AAABCCCCDDA"
sub(r'(\w)\1+',lambda m: str(len(m[0]))+m[1],dna)  # '3AB4C2DA'

Upvotes: 0

I'mahdi
I'mahdi

Reputation: 24049

You can use itertools.groupby.

In python 3.8+, You can use walrus operator (:=) and write a short approach.

>>> from itertools import groupby
>>> input_string = "ABBBBCCDDDDAAAAA"
>>> ''.join(f"{len_g}{k}" if (len_g := len(list(g))) > 1 else k for k, g in groupby(input_string))
'A4B2C4D5A'

In Python < 3.8:

from itertools import groupby

input_string = "AAABCCCCDDA"

st = ''
for k, g in groupby(input_string):
    len_g = len(list(g))
    if len_g>1:
        st += f"{len_g}{k}"
    else:
        st += k
         
print(st)

Output:'3AB4C2DA'

Upvotes: 1

Related Questions