Reputation: 2886
Im trying the global alignment
method from the Biopython
module. Using it on short sequences is easy and gives an alignment matrix straightaway. However I really need to run it on larger sequences I have (an average lenght of 2000 nucleatides (or) characters
). However I keep running into the Out of Memory
error. I looked on SO and found this previous question. The answers provided are not helpfull as they link to the this same website which cant be accessed now.Apart from this I have tried these steps:
64-bit
python since my personal computer has 4gb
RAM. ssh
ed to a small school server with 16gb
RAM and tried running on that. Its still running after close to 4 hours.Since its is a small script im unsure how to modify it. ANy help will be greatly appreciated.
My script:
import os
from Bio import pairwise2
from Bio.pairwise2 import format_alignment
file_list = []
file_list = [each for each in os.listdir(os.getcwd()) if each.endswith(".dna")]
align_file = open("seq_align.aln","w")
seq_list = []
for each_file in file_list:
f_o = open(each_file,"r")
seq_list.append(f_o.read())
for a in pairwise2.align.globalmx(seq_list[0],seq_list[1]):
align_file.write(format_alignment(*a))
align_file.close()
Upvotes: 2
Views: 818
Reputation: 2886
So the school server finally completed the task. What I realized was that for each alignment there were 1000 matrices constructed and calculated. The align.globalxx
method has a variable MAX_ALIGNMENT
which is by default set to 1000
. Changing it via monkey patching
dint really change anything . The documentation says that, the method tries all possible alignments (yes 1000) but in my case all the matrices have the same alignment score (as well as few test sequences I tried). Finally a small piece of comment in the documentation states that if you need only 1 score, use the optional parameter one_alignment_only
which accepts a boolean
value only. All i did was this:
for a in pairwise2.align.globalmx(seq_list[0],seq_list[1],one_alignment_only=True):
align_file.write(format_alignment(*a))
This reduced the time considerably. However my PC still crashed so I assume this is a very memory intensive task and requires much more RAM
(16gb
on a small server). So probably a more efficient way of reading the sequences in the matrix should be thought of.
Upvotes: 3