Reputation: 725
I have created a function in python which randomly generates nucleotide sequence:
import random
selection60 = {"A":20, "T":20, "G":30, "C":30}
sseq60=[]
for k in selection60:
sseq60 = sseq60 + [k] * int(selection60[k])
random.shuffle(sseq60)
for i in range(100):
random.shuffle(sseq60)
def generateSequence(self, length):
length = int(length)
sequence = ""
while len(sequence) < length:
sequence="".join(random.sample(self, length))
return sequence[:length]
Now, I would like to check that while I apply this function, if a newly created sequence has a similarity of > 10% to the previous sequences, the sequence is eliminated and a new one is created: I wrote something like this:
lst60=[]
newSeq=[]
for i in range(5):
while max_identity < 10:
newSeq=generateSequence(sseq60,100)
identity[i] = [newSeq[i] == newSeq[i] for i in range(len(newSeq[i]))]
max_identity[I]=100*sum(identity[i]/length(identity[i])
lst60.append(newSeq)
print(len(lst60))
However, it seems I get an empty list
Upvotes: 1
Views: 57
Reputation: 791
You have to use a nested for loop if you want to compare i
th sequence with j
th sequence for all 1 <= j < i.
Further, I created a separate getSimilarity
function for easier code readability. Pass it an old and new sequence to get the similarity.
def getSimilarity(old_seq, new_seq):
similarity = [old_seq[i] == new_seq[i] for i in range(len(new_seq))]
return 100*sum(similarity)/len(similarity)
lst60=[generateSequence(sseq60,100)]
for i in range(1,5):
newSeq = ""
max_identity = 0
while True:
newSeq = generateSequence(sseq60,100)
for j in range(0,i):
max_identity = max(max_identity, getSimilarity(lst60[j], newSeq))
if max_identity < 10:
break
lst60.append(newSeq)
print(len(lst60))
Upvotes: 1