Chiara E
Chiara E

Reputation: 105

How to run a code for multiple fastq files?

I would run the following code for multiple fastq files in a folder. In a folder I have different fastq files; first I have to read one file and perform the required operations, then store results in a separate file. fastq and then read second file, perform the same operations and save results in new 2nd file.fastq. Repeat the same procedure for all the files in the folder.

How can I do? Can someone suggest me a way to this this?

from Bio.SeqIO.QualityIO import FastqGeneralIterator
fout=open("prova_FiltraN_CE_filt.fastq","w")
fin=open("prova_FiltraN_CE.fastq","rU")
maxN=0
countall=0
countincl=0
with open("prova_FiltraN_CE.fastq", "rU") as handle:
    for (title, sequence, quality) in FastqGeneralIterator(handle):
        countN = sequence.count("N", 0, len(sequence))
        countall+=1
        if countN==maxN:
            fout.write("@%s\n%s\n+\n%s\n" % (title, sequence, quality))
            countincl+=1
fin.close
fout.close
print countall, countincl

Upvotes: 1

Views: 1068

Answers (1)

martineau
martineau

Reputation: 123463

I think the following will do what you want. What I did was make your code into a function (and modified it to be what I think is more correct) and then called that function for every .fastq file found in the designated folder. The output file names are generated from the input files found.

from Bio.SeqIO.QualityIO import FastqGeneralIterator
import glob
import os

def process(in_filepath, out_filepath):
    maxN = 0
    countall = 0
    countincl = 0
    with open(in_filepath, "rU") as fin:
        with open(out_filepath, "w") as fout:
            for (title, sequence, quality) in FastqGeneralIterator(fin):
                countN = sequence.count("N", 0, len(sequence))
                countall += 1
                if countN == maxN:
                    fout.write("@%s\n%s\n+\n%s\n" % (title, sequence, quality))
                    countincl += 1
    print os.path.split(in_filepath)[1], countall, countincl

folder = "/path/to/folder"  # folder to process
for in_filepath in glob.glob(os.path.join(folder, "*.fastq")):
    root, ext = os.path.splitext(in_filepath)
    if not root.endswith("_filt"):  # avoid processing existing output files
        out_filepath = root + "_filt" + ext
        process(in_filepath, out_filepath)

Upvotes: 2

Related Questions