Cave
Cave

Reputation: 201

How to separate a string into multiple lines of individual characters

I am not sure if the title of this question is appropriate, so anyone is welcome to edit it. Thank you!

My question is that if I have a string of a protein sequence:

seq='MIGQFGL'

How can I convert it to something like this:

MET 1
ILE 2
GLY 3
GLN 4
PHE 5
GLY 6
LEU 7

This is what I have tried:

f= open("protein.seq", "w")
seq = 'MIGQFGL'

d = {'C': 'CYS', 'D': 'ASP', 'S': 'SER', 'Q': 'GLN', 'K': 'LYS',
 'I': 'ILE', 'P': 'PRO', 'T': 'THR', 'F': 'PHE', 'N': 'ASN',
 'G': 'GLY', 'H': 'HIS', 'L': 'LEU', 'R': 'ARG', 'W': 'TRP',
 'A': 'ALA', 'V':'VAL', 'E': 'GLU', 'Y': 'TYR', 'M': 'MET'}

sp = list(seq)
rep = '\n'.join(d.get(e,e) for e in sp) #to replace the items in list 'sp' with corresponding dictionary values

no = list(range(1,8))
n = '\n'.join(str(x) for x in no)

line = "{}\t{}\n".format(rep,n)
f.write(line)

But this is what I got:

MET
ILE
GLY
GLN
PHE
GLY
LEU 1
2
3
4
5
6
7

So, I changed this line:

line = "{}\t{}\n".format(rep,n) 

to:

line = "{}\t{}\n".format(zip(rep,n)) 

But I got:

Traceback (most recent call last):
  File "protein.py", line 15, in <module>
    line = "{}\t{}\n".format(zip(rep,n))
IndexError: tuple index out of range

What am I doing wrong? Thanks in advance!

NB: I use Python 3.

Upvotes: 1

Views: 75

Answers (4)

aydow
aydow

Reputation: 3801

Using enumerate will get you what you want

rep = '\n'.join('{0} {1}'.format(d.get(s,s), i+1) for i, s in enumerate(seq))

Also, it is best practice to use with when file handling as it both safer and neater. I.e.

with open('proteins.seq', 'w') as f:
    f.write(rep)

Upvotes: 3

whackamadoodle3000
whackamadoodle3000

Reputation: 6748

Try this small fix of your code:

f= open("protein.seq", "w")
seq = 'MIGQFGL'

d = {'C': 'CYS', 'D': 'ASP', 'S': 'SER', 'Q': 'GLN', 'K': 'LYS',
 'I': 'ILE', 'P': 'PRO', 'T': 'THR', 'F': 'PHE', 'N': 'ASN',
 'G': 'GLY', 'H': 'HIS', 'L': 'LEU', 'R': 'ARG', 'W': 'TRP',
 'A': 'ALA', 'V':'VAL', 'E': 'GLU', 'Y': 'TYR', 'M': 'MET'}

sp = list(seq)
rep = [d.get(e,e) for e in sp] #to replace the items in list 'sp' with corresponding dictionary values

no = list(range(1,8))
n = [str(x) for x in no]

line = '\n'.join([e[0]+"  "+e[1] for e in zip(rep,n)])
f.write(line)

Upvotes: 2

Vishal Khichadiya
Vishal Khichadiya

Reputation: 430

Hope you're finding something like this one!!

[{i+1,j} for i,j in enumerate(list(seq))]

Results:

[set([1, 'M']), set(['I', 2]), set([3, 'G']), set(['Q', 4]), set([5, 'F']), set([6, 'G']), set(['L', 7])]

Upvotes: 1

ForceBru
ForceBru

Reputation: 44838

You're very close:

result = '\n'.join(f"{d.get(e,e)} {i}" for i, e in enumerate(seq, 1))
  • you can iterate over individual characters of a string right away
  • enumerate(seq, i) is an iterator that yields values in the form ((i, seq[0]), (i + 1, seq[0 + 1]), ...)

Upvotes: 2

Related Questions