Harpal
Harpal

Reputation: 12587

Python: String formatting where the line length can vary

So I have the script below:

def single_to_tripple(res):
    aa = {'R':'ARG','H':'HIS','K':'LYS','D':'ASP','E':'GLU','S':'SER','T':'THR','N':'ASN','Q':'GLN','C':'CYS','U':'SEC','G':'GLY','P':'PRO','A':'ALA','I':'ILE','L':'LEU','M':'MET','F':'PHE','W':'TRP','Y':'TYR','V':'VAL'}
    return(aa[res])
seq = 'ASALKDYYAIMGVKPTDDLKTIKTAYRRLARKYHPDVSKEPDAEARFKEVAEAWEVLSDEQRRAEYDQMWQHRNDPQFNRQFHHGDGQSFNAEDFDDIFSSIFGQHARQSRQRPATRGHDIEIEVAVFLEETLTEHKRTISYNLPVYNAFGMIEQEIPKTLNVKIPAGVGNGQRIRLKGQGTPGENGGPNGDLWLVIHIAPHPLFDIVGQDLEIVVPVSPWEAALGAKVTVPTLKESILLTIPPGSQAGQRLRVKGKGLVSKKQTGDLYAVLKIVMPPKPDENTAALWQQLADAQSSFDPRKDWGKA'
length = len(seq)

for i,v in enumerate(xrange(0,len(seq),13)):
    line = seq[v:v+13]
    out_line = ('{:<3} '*13).format(single_to_tripple(line[0]),single_to_tripple(line[1]),single_to_tripple(line[2]),single_to_tripple(line[3]),single_to_tripple(line[4]),single_to_tripple(line[5]),single_to_tripple(line[6]),single_to_tripple(line[7]),single_to_tripple(line[8]),single_to_tripple(line[9]),single_to_tripple(line[10]),single_to_tripple(line[11]),single_to_tripple(line[12]))
    print out_line

I am using the script to splice the seq string every 13 elements and then convert each element in the spliced string from its single letter code to its tripple letter code in single_to_tripple. The output of my data needs to be contain 13 columns separated by a space. The problem occurs at the last splice if the splice does not contain 13 elements. How can I catch this and format the string as I usually would?

I use enumerate in my for loop because I will need to add the line numbers in later.

My current code outputs:

ALA SER ALA LEU LYS ASP TYR TYR ALA ILE MET GLY VAL 
LYS PRO THR ASP ASP LEU LYS THR ILE LYS THR ALA TYR 
ARG ARG LEU ALA ARG LYS TYR HIS PRO ASP VAL SER LYS 
GLU PRO ASP ALA GLU ALA ARG PHE LYS GLU VAL ALA GLU 
ALA TRP GLU VAL LEU SER ASP GLU GLN ARG ARG ALA GLU 
TYR ASP GLN MET TRP GLN HIS ARG ASN ASP PRO GLN PHE 
ASN ARG GLN PHE HIS HIS GLY ASP GLY GLN SER PHE ASN 
ALA GLU ASP PHE ASP ASP ILE PHE SER SER ILE PHE GLY 
GLN HIS ALA ARG GLN SER ARG GLN ARG PRO ALA THR ARG 
GLY HIS ASP ILE GLU ILE GLU VAL ALA VAL PHE LEU GLU 
GLU THR LEU THR GLU HIS LYS ARG THR ILE SER TYR ASN 
LEU PRO VAL TYR ASN ALA PHE GLY MET ILE GLU GLN GLU 
ILE PRO LYS THR LEU ASN VAL LYS ILE PRO ALA GLY VAL 
GLY ASN GLY GLN ARG ILE ARG LEU LYS GLY GLN GLY THR 
PRO GLY GLU ASN GLY GLY PRO ASN GLY ASP LEU TRP LEU 
VAL ILE HIS ILE ALA PRO HIS PRO LEU PHE ASP ILE VAL 
GLY GLN ASP LEU GLU ILE VAL VAL PRO VAL SER PRO TRP 
GLU ALA ALA LEU GLY ALA LYS VAL THR VAL PRO THR LEU 
LYS GLU SER ILE LEU LEU THR ILE PRO PRO GLY SER GLN 
ALA GLY GLN ARG LEU ARG VAL LYS GLY LYS GLY LEU VAL 
SER LYS LYS GLN THR GLY ASP LEU TYR ALA VAL LEU LYS 
ILE VAL MET PRO PRO LYS PRO ASP GLU ASN THR ALA ALA 
LEU TRP GLN GLN LEU ALA ASP ALA GLN SER SER PHE ASP 
Traceback (most recent call last):
  File "make_seq_res.py", line 10, in <module>
    out_line = ('{:<3} '*13).format(single_to_tripple(line[0]),single_to_tripple(line[1]),single_to_tripple(line[2]),single_to_tripple(line[3]),single_to_tripple(line[4]),single_to_tripple(line[5]),single_to_tripple(line[6]),single_to_tripple(line[7]),single_to_tripple(line[8]),single_to_tripple(line[9]),single_to_tripple(line[10]),single_to_tripple(line[11]),single_to_tripple(line[12]))
IndexError: string index out of range

Upvotes: 1

Views: 184

Answers (3)

Martijn Pieters
Martijn Pieters

Reputation: 1121774

You just need to join your strings together, no formatting required:

for i,v in enumerate(xrange(0,len(seq),13)):
    line = seq[v:v+13]
    print ' '.join([single_to_tripple(part) for part in line])

No need to overcomplicate things here. :-)

Note that when using str.join(), use a list comprehension over a generator expression (so include the [...]) as .join() will cast to a list anyway making the list comprehension faster.

Result (last 3 lines):

ILE VAL MET PRO PRO LYS PRO ASP GLU ASN THR ALA ALA
LEU TRP GLN GLN LEU ALA ASP ALA GLN SER SER PHE ASP
PRO ARG LYS ASP TRP GLY LYS ALA

You could also use a itertools-based grouper to simplify your loop:

from itertools import izip_longest

def grouper(n, iterable, padvalue=None):
    "grouper(3, 'abcdefg', 'x') --> ('a','b','c'), ('d','e','f'), ('g','x','x')"
    return izip_longest(*[iter(iterable)]*n, fillvalue=padvalue)

aa = {'R':'ARG','H':'HIS','K':'LYS','D':'ASP','E':'GLU','S':'SER','T':'THR','N':'ASN','Q':'GLN','C':'CYS','U':'SEC','G':'GLY','P':'PRO','A':'ALA','I':'ILE','L':'LEU','M':'MET','F':'PHE','W':'TRP','Y':'TYR','V':'VAL', None: ''}
def single_to_tripple(res):
    return(aa[res])

for line in grouper(13, seq):
    print ' '.join([single_to_tripple(part) for part in line])

where I enhanced your single_to_tripple() function by moving the mapping out of the function (no need to define it each and every time you call it), and adding a None key (the grouper pads the last group with None values).

Upvotes: 2

Fenikso
Fenikso

Reputation: 9451

You can save your line length and than use it:

def single_to_tripple(res):
    aa = {'R':'ARG','H':'HIS','K':'LYS','D':'ASP','E':'GLU','S':'SER','T':'THR','N':'ASN','Q':'GLN','C':'CYS','U':'SEC','G':'GLY','P':'PRO','A':'ALA','I':'ILE','L':'LEU','M':'MET','F':'PHE','W':'TRP','Y':'TYR','V':'VAL'}
    return(aa[res])

seq = 'ASALKDYYAIMGVKPTDDLKTIKTAYRRLARKYHPDVSKEPDAEARFKEVAEAWEVLSDEQRRAEYDQMWQHRNDPQFNRQFHHGDGQSFNAEDFDDIFSSIFGQHARQSRQRPATRGHDIEIEVAVFLEETLTEHKRTISYNLPVYNAFGMIEQEIPKTLNVKIPAGVGNGQRIRLKGQGTPGENGGPNGDLWLVIHIAPHPLFDIVGQDLEIVVPVSPWEAALGAKVTVPTLKESILLTIPPGSQAGQRLRVKGKGLVSKKQTGDLYAVLKIVMPPKPDENTAALWQQLADAQSSFDPRKDWGKA'
length = len(seq)

for i,v in enumerate(xrange(0,len(seq),13)):
    line = seq[v:v+13]
    length = len(line)
    out_line = ('{:<3} '*length).format(*[single_to_tripple(a) for a in line])
    print out_line

Upvotes: 0

poke
poke

Reputation: 387607

The fact that you had to type out so many variables manually should have given you a hint that you are doing way more than necessary to produce that output.

Without changing much of your original code, could do it like this:

for i,v in enumerate(xrange(0,len(seq),13)):
    line = seq[v:v+13]
    out_line = ' '.join('{:<3}'.format(single_to_tripple(part)) for part in line)
    print out_line

As Martijn pointed out, the triplets are always three characters, so you can actually skip the formatting:

out_line = ' '.join(single_to_tripple(part) for part in line)

Upvotes: 3

Related Questions