Reputation: 17
I'm trying to enter a string into the one_frame function ():
one_frame("catatgdaftgaatg")
What I am trying to do is return a list ["atgdaf","atg"]
. whenever I find "atg"
in the string, I want to grab that "atg"
and whatever is after that until I reach either "taa" "tga" "tag"
. This is what I have so far but it only returns ["atg"]
.
def get_orf(dna_seq):
for x in dna_seq:
if("taa" in dna_seq or "tag" in dna_seq or "tga" in dna_seq):
dna_seq=dna_seq.replace("taa","")
dna_seq=dna_seq.replace("tga","")
dna_seq=dna_seq.replace("tag","")
return dna_seq
else:
return dna_seq
def one_frame(dna):
c=0
q=3
dna_list=[]
dna_string=""
while(q<=len(dna)):
dna_string=dna[c:q]
c=c+3
q=q+3
if(dna_string=="atg"):
dna_list.append(get_orf(dna_string))
Upvotes: 1
Views: 81
Reputation: 1
If I correctly understood your task, then you could use the split
method mentioned here with atg delimiter. You'll get the list of substrings, then for each substring you could get index of occurences of taa, tga, tag substrings, and cut substring at minimal index.
Upvotes: 0
Reputation: 2093
dna_list.append(get_orf(dna_string+dna[c:q+3]))
just update the append function...
since the length of dna_string is always 3..you will never get the desired result which is of 6 character
update:- I forget the other conditions you mentioned.
def get_orf(dna_seq):
for x in dna_seq:
# print dna_seq
if("taa" in dna_seq or "tag" in dna_seq or "tga" in dna_seq):
dna_seq=dna_seq.replace("taa","")
dna_seq=dna_seq.replace("tga","")
dna_seq=dna_seq.replace("tag","")
if('atg' in dna_seq[3:]):
dna_seq="atg"+dna_seq[3:].replace("atg","")
return dna_seq
else:
return dna_seq
def one_frame(dna):
c=0
q=3
dna_list=[]
dna_string=""
while(q<=len(dna)):
dna_string=dna[c:q]
c=c+3
q=q+3
print dna_string
if(dna_string=="atg"):
dna_list.append(get_orf(dna[c-3:]))
print dna_list
Upvotes: 1
Reputation: 3039
In your method, you're moving by a step of 3. That is why you're not able to catch the second atg
. And about your second problem, you're always passing only 3 characters to the get_orf
method. You need to pass the whole string to find the characters you need.
def get_orf(dna_seq):
for counter, val in enumerate(dna_seq):
if val == "t" and counter < len(dna_seq)-2:
if dna_seq[counter+1:counter+3] in ["aa", "ag"]:
return dna_seq[:counter]
if dna_seq[counter+1:counter+3] == "ga":
return dna_seq[:counter]
return dna_seq
def one_frame(dna):
c = 0
dna_list = []
while c <= len(dna)-3:
dna_string = dna[c:]
if dna_string.startswith("atg"):
dna_list.append(get_orf(dna_string))
c += 1
return dna_list
Upvotes: 0