Biopython Global Alignment : Out of Memory

Question

Im trying the global alignment method from the Biopython module. Using it on short sequences is easy and gives an alignment matrix straightaway. However I really need to run it on larger sequences I have (an average lenght of 2000 nucleatides (or) characters). However I keep running into the Out of Memory error. I looked on SO and found this previous question. The answers provided are not helpfull as they link to the this same website which cant be accessed now.Apart from this I have tried these steps:

I tried using a 64-bit python since my personal computer has 4gb RAM.
sshed to a small school server with 16gb RAM and tried running on that. Its still running after close to 4 hours.

Since its is a small script im unsure how to modify it. ANy help will be greatly appreciated.

My script:

import os
from Bio import pairwise2
from Bio.pairwise2 import format_alignment

file_list = []

file_list = [each for each in os.listdir(os.getcwd()) if each.endswith(".dna")]

align_file = open("seq_align.aln","w")

seq_list = []

for each_file in file_list:
    f_o = open(each_file,"r")
    seq_list.append(f_o.read())

for a in pairwise2.align.globalmx(seq_list[0],seq_list[1]):
    align_file.write(format_alignment(*a))

align_file.close()

Biopython Global Alignment : Out of Memory

Answers (1)

Related Questions