JAZs
JAZs

Reputation: 29

Python script ot convert peptide sequence in SVM input format

I have one query related to conversion of protein sequence into the SVM input format. Recently i found one tutorial which shows the conversion of protein sequence into sparse binary representation (Ex. Given below)kindly please help me, like how can i convert my 7mer sequences into this format with the help of python script.

Amino Acide                Binary code 
A                   10000000000000000000
C                   01000000000000000000
D                   00100000000000000000
E                   00010000000000000000
F                   00001000000000000000
G                   00000100000000000000
H                   00000010000000000000
I                   00000001000000000000
K                   00000000100000000000
L                   00000000010000000000
M                   00000000001000000000
N                   00000000000100000000
P                   00000000000010000000
Q                   00000000000001000000
R                   00000000000000100000
S                   00000000000000010000
T                   00000000000000001000
V                   00000000000000000100
W                   00000000000000000010
Y                   00000000000000000001

Example with 2mer peptide

Peptide            Sparse binary encoding of peptide        SVM input

AD            1000000000000000000000100000000000000000    +1 1:1 23:1
YC            0000000000000000000101000000000000000000    -1 20:1 22:1

It should generate a out.txt file of SVM input for a respective peptide. like given below.

+1 1:1 23:1
-1 20:1 22:1

Thanks.

Upvotes: 0

Views: 389

Answers (1)

yemu
yemu

Reputation: 28259

bin_dict = {
'A':'10000000000000000000',
'C':'01000000000000000000',
'D':'01000000000000000000',
'E':'00010000000000000000',
'F':'00001000000000000000',
'G':'00000100000000000000',
'H':'00000010000000000000',
'I':'00000001000000000000',
'K':'00000000100000000000',
'L':'00000000010000000000',
'M':'00000000001000000000',
'N':'00000000000100000000',
'P':'00000000000010000000',
'Q':'00000000000001000000',
'R':'00000000000000100000',
'S':'00000000000000010000',
'T':'00000000000000001000',
'V':'00000000000000000100',
'W':'00000000000000000010',
'Y':'00000000000000000001'
}

seq="ACDE"
bin_string=''
svm_string=''
for letter in seq:
    bin_string+=bin_dict[letter]

This will generate string with binary representation of the seq.

Upvotes: 1

Related Questions