GotYa
GotYa

Reputation: 121

Python inserting spaces in string

Alright, I'm working on a little project for school, a 6-frame translator. I won't go into too much detail, I'll just describe what I wanted to add. The normal output would be something like:

TTCPTISPALGLAWS_DLGTLGFMSYSANTASGETLVSLYQLGLFEM_VVSYGRTKYYLICP_LFHLSVGFVPSD

The important part of this string are the M and the _ (the start and stop codons, biology stuff). What I wanted to do was highlight these like so:

TTCPTISPALGLAWS_DLGTLGF 'MSYSANTASGETLVSLYQLGLFEM_' VVSYGRTKYYLICP_LFHLSVGFVPSD

Now here is where (for me) it gets tricky, I got my output to look like this (adding a space and a ' to highlight the start and stop). But it only does this once, for the first start and stop it finds. If there are any other M....._ combinations it won't highlight them.

Here is my current code, attempting to make it highlight more than once:

def start_stop(translation):
index_2 = 0
while True:
    if 'M' in translation[index_2::1]:
        index_1 = translation[index_2::1].find('M')
        index_2 = translation[index_1::1].find('_') + index_1
        new_translation = translation[:index_1] + " '" + \
                          translation[index_1:index_2 + 1] + "' " +\
                          translation[index_2 + 1:]
    else:
        break
    return new_translation

I really thought this would do it, guess not. So now I find myself being stuck. If any of you are willing to try and help, here is a randomly generated string with more than one M....._ set:

'TTCPTISPALGLAWS_DLGTLGFMSYSANTASGETLVSLYQLGLFEM_VVSYGRTKYYLICP_LFHLSVGFVPSDGRRLTLYMPPARRLATKSRFLTPVISSG_DKPRHNPVARSQFLNPLVRPNYSISASKSGLRLVLSYTRLSLGINSLPIERLQYSVPAPAQITP_IPEHGNARNFLPEWPRLLISEPAPSVNVPCSVFVVDPEHPKAHSKPDGIANRLTFRWRLIG_VFFHNAL_VITHGYSRVDILLPVSRALHVHLSKSLLLRSAWFTLRNTRVTGKPQTSKT_FDPKATRVHAIDACAE_QQH_PDSGLRFPAPGSCSEAIRQLMI'

Thank you to anyone willing to help :)

Upvotes: 3

Views: 522

Answers (2)

Aaditya Ura
Aaditya Ura

Reputation: 12669

You just require little slice of 'slice' module , you don't need any external module :

Python string have a method called 'index' just use it.

string_1='TTCPTISPALGLAWS_DLGTLGFMSYSANTASGETLVSLYQLGLFEM_VVSYGRTKYYLICP_LFHLSVGFVPSD'

before=string_1.index('M')
after=string_1[before:].index('_')
print('{}  {} {}'.format(string_1[:before],string_1[before:before+after+1],string_1[before+after+1:]))

output:

TTCPTISPALGLAWS_DLGTLGF  MSYSANTASGETLVSLYQLGLFEM_ VVSYGRTKYYLICP_LFHLSVGFVPSD

Upvotes: 0

tzaman
tzaman

Reputation: 47770

Regular expressions are pretty handy here:

import re
sequence = "TTCP...."
highlighted = re.sub(r"(M\w*?_)", r" '\1' ", sequence)

# Output:
"TTCPTISPALGLAWS_DLGTLGF 'MSYSANTASGETLVSLYQLGLFEM_' VVSYGRTKYYLICP_LFHLSVGFVPSDGRRLTLY 'MPPARRLATKSRFLTPVISSG_' DKPRHNPVARSQFLNPLVRPNYSISASKSGLRLVLSYTRLSLGINSLPIERLQYSVPAPAQITP_IPEHGNARNFLPEWPRLLISEPAPSVNVPCSVFVVDPEHPKAHSKPDGIANRLTFRWRLIG_VFFHNAL_VITHGYSRVDILLPVSRALHVHLSKSLLLRSAWFTLRNTRVTGKPQTSKT_FDPKATRVHAIDACAE_QQH_PDSGLRFPAPGSCSEAIRQLMI"

Regex explanation:
We look for an M followed by any number of "word characters" \w* then an _, using the ? to make it a non-greedy match (otherwise it would just make one group from the first M to the last _).
The replacement is the matched group (\1 indicates "first group", there's only one), but surrounded by spaces and quotes.

Upvotes: 4

Related Questions