using Python to retrieve missing sequences -'split' command does not work

Question

I have a set of (protein)sequences that has been found using a software but they are shorter in length than that of the original ones in the database.I downloaded the entire database ,and now i have these set of incomplete sequences that have been found and the original database from which the sequences have been found.

Example result from software:

>tr|E7EWP2|E7EWP2_HUMAN  Uncharacterized protein OS=Homo sapiens GN=TRIO PE=4 SV=2
KEFIMAELIQTEKAYVRDLRECMDTYLWEMTSGVE

Sequence in the database:

>tr|E7EWP2|E7EWP2_HUMAN  Uncharacterized protein OS=Homo sapiens GN=TRIO PE=4 SV=2
ARRKEFIMAELIQTEKAYVRDLRECMDTYLWEMTSGVEEIP

So the missing residues are 'ARR' and in the end 'EIP', I have around 70 incomplete sequences like this? I would like to write a Python program that can automatically retrieve the complete sequences from the database. I am really new to python ,ofcourse i will try to write my own code ,i would like to know if there are any libraries or something like biopython modules that can do this. My plan is to take the intervals from my result,expand them and select it on the original database,but i do not know how to proceed further.

i would like to get list_seq = [ARR,KEFIMAELIQTEKAYVRDLRECMDTYLWEMTSGVE,EIP] so that i can further use list_seq[0] r.strip(3) and list_seq[1] l.strip[3] so that i get the complete sequence. but list_seq does not work.

Thanks in advance

using Python to retrieve missing sequences -'split' command does not work

Answers (1)

Related Questions

using Python to retrieve missing sequences -&#39;split&#39; command does not work

Answers (1)

Related Questions

using Python to retrieve missing sequences -'split' command does not work