Reputation: 105
I would run the following code for multiple fastq files in a folder. In a folder I have different fastq files; first I have to read one file and perform the required operations, then store results in a separate file. fastq and then read second file, perform the same operations and save results in new 2nd file.fastq. Repeat the same procedure for all the files in the folder.
How can I do? Can someone suggest me a way to this this?
from Bio.SeqIO.QualityIO import FastqGeneralIterator
fout=open("prova_FiltraN_CE_filt.fastq","w")
fin=open("prova_FiltraN_CE.fastq","rU")
maxN=0
countall=0
countincl=0
with open("prova_FiltraN_CE.fastq", "rU") as handle:
for (title, sequence, quality) in FastqGeneralIterator(handle):
countN = sequence.count("N", 0, len(sequence))
countall+=1
if countN==maxN:
fout.write("@%s\n%s\n+\n%s\n" % (title, sequence, quality))
countincl+=1
fin.close
fout.close
print countall, countincl
Upvotes: 1
Views: 1068
Reputation: 123463
I think the following will do what you want. What I did was make your code into a function (and modified it to be what I think is more correct) and then called that function for every .fastq
file found in the designated folder. The output file names are generated from the input files found.
from Bio.SeqIO.QualityIO import FastqGeneralIterator
import glob
import os
def process(in_filepath, out_filepath):
maxN = 0
countall = 0
countincl = 0
with open(in_filepath, "rU") as fin:
with open(out_filepath, "w") as fout:
for (title, sequence, quality) in FastqGeneralIterator(fin):
countN = sequence.count("N", 0, len(sequence))
countall += 1
if countN == maxN:
fout.write("@%s\n%s\n+\n%s\n" % (title, sequence, quality))
countincl += 1
print os.path.split(in_filepath)[1], countall, countincl
folder = "/path/to/folder" # folder to process
for in_filepath in glob.glob(os.path.join(folder, "*.fastq")):
root, ext = os.path.splitext(in_filepath)
if not root.endswith("_filt"): # avoid processing existing output files
out_filepath = root + "_filt" + ext
process(in_filepath, out_filepath)
Upvotes: 2