Reputation: 499
I want to align two DNA sequences in an optimal way, but I have the gap penalty function of length L, that if L is a multiple of 3, the penalty is a * L for some constant a. If L is not a multiple of 3, then the penalty is b * L for some constant b.
I am supposed to design a O(n * m) algorithm, where n and m are lengths of DNA sequences, that finds the optimal alignment. But the tricky part about this is that I have to keep track of the size of the gap it is extending. For example, if I have two contiguous gaps and extend one gap further, I would need to update the score by aL - b(L-1), but I couldn't formulate subproblems that handle this situation well. I've thought about introducing a new parameter L for "guessing" the length of the final gap, but that would easily exceeds O(n * m).
Is there a way to formulate these subproblems effectively? Any key observations would be greatly appreciated.
Upvotes: 0
Views: 580
Reputation: 759
Can you clarify on what you mean by contiguous gaps? If I'm understanding you correctly, an example alignment of your scenario would look something like this: AAATTTGGGAA----CCCGG AAATTTCGGAA-----GCGG However, including matching gaps on both strands accomplishes nothing. The above alignment is identical to this: AAATTTGGGAACCCGG AAATTTCGGAA-GCGG A gap penalty should only be relevant for a single strand at any given time. To gauge how changing alignment parameters impacts the alignment output, I suggest looking at the biopython Pairwise2 module (which can include a gap penalty/gap extension penalty). http://biopython.org/DIST/docs/api/Bio.pairwise2-module.html
Upvotes: 0