a_riou
a_riou

Reputation: 25

Snakemake : Missing input files for rule all

I'm developing my first Snakemake workflow and I'm stuck because of an error.

I wanted to start with a single rule first to test my code. I created the fastQC rule. However when I run my snakemake, I get the following error message:

MissingInputException in line 24 of /ngs/prod/nanocea_project/test/Snakefile:
Missing input files for rule all:
stats/fastqc/02062021_1/02062021_1_fastqc.html
stats/fastqc/02062021_1/02062021_1_fastqc.zip
stats/fastqc/02062021_2/02062021_2_fastqc.html
stats/fastqc/25022021_2/25022021_2_fastqc.zip
stats/fastqc/25022021_2/25022021_2_fastqc.html
stats/fastqc/02062021_2/02062021_2_fastqc.zip

Here is my code:

import glob
import os

###Global Variables###

FORMATS=["zip", "html"]
OUTDIR="/ngs/prod/nanocea_project/test/stats/fastqc"
DIR_FASTQ="/ngs/prod/nanocea_project/test/reads"

###FASTQ Files###

def list_samples(DIR_FASTQ):
        SAMPLES=[]
        for file in glob.glob(DIR_FASTQ+"/*.fastq.gz"):
                base=os.path.basename(file)
                sample=(base.replace('.fastq.gz', ''))
                SAMPLES.append(sample)
        return(SAMPLES)

SAMPLES=list_samples(DIR_FASTQ)

###Rules###

rule all:
        input:
                expand("stats/fastqc/{sample}/{sample}_fastqc.{ext}", sample=SAMPLES, ext=FORMATS)

rule fastqc:
        input:
                expand(DIR_FASTQ+"/{sample}.fastq.gz", sample=SAMPLES)
        output:
                expand(OUTDIR+"/{sample}_fastqc.{ext}", sample=SAMPLES, ext=FORMATS)
        threads:
                16
        conda:
                "envs/fastqc.yaml"
        shell:
                """
                mkdir stats/fastqc/{sample}
                fastqc {input} -o {OUTDIR}/{sample} -t {threads}
                """

And here is the structure of my files:

|
|_ Snakefile
|
|_/reads
|   |
|   |_25022021_2.fastq.gz
|   |
|   |_02062021_1.fastq.gz
|   |
|   |_02062021_2.fastq.gz
|
|_/envs
|   |
|   |_fastqc.yaml
|
|_/stats
|   |
|   |_/fastqc

I searched the other topics for solutions to my problem, but I couldn't get my workflow to work.

Do you have any ideas?

Thank you!

EDIT after dariober's answer

Thank you for this answer. After several attempts, the only solution that worked was to code directly in the all and fastqc rules the full path.

First question: Why my global variable, though modified to match my all rule, did not work?

Second question: Now that the first problem is solved, a new problem appears when I run my program : snakemake --use-conda --cores 40

RuleException in line 28 of /ngs/prod/nanocea_project/test/Snakefile: NameError: The name 'sample' is unknown in this context. Please make sure that you defined that variable. Also note that braces not used for variable access have to be escaped by repeating them, i.e. {{print $1}}

I tried with the double braces but when the mkdir function starts, it creates a folder named {sample}. I don't understand why it created this folder.

The new code:

import glob
import os

###Global Variables###

FORMATS=["zip", "html"]
DIR_FASTQ="/ngs/prod/nanocea_project/test/reads"

###FASTQ Files###

def list_samples(DIR_FASTQ):
        SAMPLES=[]
        for file in glob.glob(DIR_FASTQ+"/*.fastq.gz"):
                base=os.path.basename(file)
                sample=(base.replace('.fastq.gz', ''))
                SAMPLES.append(sample)
        return(SAMPLES)

SAMPLES=list_samples(DIR_FASTQ)

###Rules###

rule all:
        input:
                expand("/ngs/prod/nanocea_project/test/stats/fastqc/{sample}/{sample}_fastqc.{ext}", sample=SAMPLES, ext=FORMATS)

rule fastqc:
        input:
                expand(DIR_FASTQ+"/{sample}.fastq.gz", sample=SAMPLES)
        output:
                expand("/ngs/prod/nanocea_project/test/stats/fastqc/{sample}/{sample}_fastqc.{ext}", sample=SAMPLES, ext=FORMATS)
        threads:
                16
        conda:
                "envs/fastqc.yaml"
        shell:
                """
                mkdir stats/fastqc/{sample}
                fastqc {input} -o /ngs/prod/nanocea_project/test/stats/fastqc/{sample} -t {threads}
                """

Upvotes: 1

Views: 3347

Answers (1)

dariober
dariober

Reputation: 9062

In rule all you have:

stats/fastqc/...

but in rule fastqc, after expanding the OUTDIR variable, you have:

/ngs/prod/nanocea_project/test/stats/fastqc/...

Even if they point to the same directory, the two strings don't match and snakemake gives the error.

Upvotes: 1

Related Questions