Reputation: 315
I'm trying to create a workflow where I take a directory that contains a list of input files and runs them through a command line tool and outputs the results into an output directory. It should be really simple, and I have gotten it to work ... mostly.
The problem is that whenever I give it a input directory, I get an error of "Skipping FILE which didn't exist, or couldn't be read" even though I am 100% certain that the files exist in my input directory.
However, if I alter the code just a little bit, and make it so that I only feed it an input file and not a directory, the script runs along like it should and completes perfectly.
My input files are gzipped.
Here is the script:
import argparse
import subprocess
import os
parser = argparse.ArgumentParser(description="A RNAseq pipeline for pair-end data")
parser.add_argument("-i", "--inputDir", help="A input directory containing your gzipped fastq files", required=True)
parser.add_argument("-o", "--outputDir", help="Output directory", required=True)
parser.parse_args()
### Define global variables
args = parser.parse_args()
inputDir = args.inputDir
outputDir = args.outputDir
### Grab all fastq files in input directory
fastq_directory = os.listdir("{}".format(inputDir))
fastq_files = []
for file in fastq_directory:
fastq_files.append(file)
### Run FastQC
for file in fastq_files:
fastqc_command = "fastqc --extract -o {} {}".format(outputDir, file)
subprocess.check_output(['bash', '-c', fastqc_command])
The error:
Skipping 'KO1_R1.fastq.gz' which didn't exist, or couldn't be read
Skipping 'KO1_R2.fastq.gz' which didn't exist, or couldn't be read
Skipping 'KO2_R1.fastq.gz' which didn't exist, or couldn't be read
Skipping 'KO2_R2.fastq.gz' which didn't exist, or couldn't be read
Skipping 'KO3_R1.fastq.gz' which didn't exist, or couldn't be read
Skipping 'KO3_R2.fastq.gz' which didn't exist, or couldn't be read
Skipping 'WT1_R1.fastq.gz' which didn't exist, or couldn't be read
Skipping 'WT1_R2.fastq.gz' which didn't exist, or couldn't be read
Skipping 'WT2_R1.fastq.gz' which didn't exist, or couldn't be read
Skipping 'WT2_R2.fastq.gz' which didn't exist, or couldn't be read
Skipping 'WT3_R1.fastq.gz' which didn't exist, or couldn't be read
Skipping 'WT3_R2.fastq.gz' which didn't exist, or couldn't be read
Any recommendations?
PS: I know the script is terrible, but i'm learning :). Though suggestions definitely welcomed!
Upvotes: 0
Views: 2166
Reputation: 24719
Try changing this:
fastq_directory = os.listdir("{}".format(inputDir))
fastq_files = []
for file in fastq_directory:
fastq_files.append(file)
To this:
fastq_directory = os.listdir("{}".format(inputDir))
fastq_files = []
for file in fastq_directory:
fastq_files.append(os.path.join(inputDir, file))
This is because os.listdir()
will only return filenames, not full paths.
Upvotes: 2