Reputation: 29
I have one query related to conversion of protein sequence into the SVM input format. Recently i found one tutorial which shows the conversion of protein sequence into sparse binary representation (Ex. Given below)kindly please help me, like how can i convert my 7mer sequences into this format with the help of python script.
Amino Acide Binary code
A 10000000000000000000
C 01000000000000000000
D 00100000000000000000
E 00010000000000000000
F 00001000000000000000
G 00000100000000000000
H 00000010000000000000
I 00000001000000000000
K 00000000100000000000
L 00000000010000000000
M 00000000001000000000
N 00000000000100000000
P 00000000000010000000
Q 00000000000001000000
R 00000000000000100000
S 00000000000000010000
T 00000000000000001000
V 00000000000000000100
W 00000000000000000010
Y 00000000000000000001
Example with 2mer peptide
Peptide Sparse binary encoding of peptide SVM input
AD 1000000000000000000000100000000000000000 +1 1:1 23:1
YC 0000000000000000000101000000000000000000 -1 20:1 22:1
It should generate a out.txt file of SVM input for a respective peptide. like given below.
+1 1:1 23:1
-1 20:1 22:1
Thanks.
Upvotes: 0
Views: 389
Reputation: 28259
bin_dict = {
'A':'10000000000000000000',
'C':'01000000000000000000',
'D':'01000000000000000000',
'E':'00010000000000000000',
'F':'00001000000000000000',
'G':'00000100000000000000',
'H':'00000010000000000000',
'I':'00000001000000000000',
'K':'00000000100000000000',
'L':'00000000010000000000',
'M':'00000000001000000000',
'N':'00000000000100000000',
'P':'00000000000010000000',
'Q':'00000000000001000000',
'R':'00000000000000100000',
'S':'00000000000000010000',
'T':'00000000000000001000',
'V':'00000000000000000100',
'W':'00000000000000000010',
'Y':'00000000000000000001'
}
seq="ACDE"
bin_string=''
svm_string=''
for letter in seq:
bin_string+=bin_dict[letter]
This will generate string with binary representation of the seq.
Upvotes: 1