Reputation: 447
I want to know if one can define a input rule that has dependencies on different wildcards.
To elaborate, I am running this Snakemake pipeline on different fastq files using qsub which submits each job to a different node:
MultiQC - https://multiqc.info/ - runs on the results folder which has results from fastqc, star, rsem, etc. However, because each job runs on a different node, sometimes Step 3 (fastqc and/or fastqc_after) is still running on the nodes while other steps finish running (Steps 2, 4 and 5) OR vice-versa.
Currently, I can create a MultiQc rule which waits on results from Steps 2, 4, 5 because they are linked to each other by input/output rules.
I have attached my pipeline as png to this post. Any suggestions would help.
What I need: I want to create a "collating" step where I want MultiQC to wait till all steps (from 1 to 5) finish. In other words, using my attached png as guide, I want to define multiple input rules for MultiQC that also wait on results from fastqc
Thanks in advance.
Note: Based on comments I received from 'colin' and 'bli' after my original post, I have shared the code for the different rules here.
Step 1 - fastqc
rule fastqc:
input: "raw_fastq/{sample}.fastq"
output: "results/fastqc/{sample}_fastqc.zip"
log: "results/logs/fq_before/{sample}.fastqc.log"
params: ...
shell: ...
Step 2 - bbduk
rule bbduk:
input: R1 = "raw_fastq/{sample}.fastq"
output: R1 = "results/bbduk/{sample}_trimmed.fastq",
params: ...
log: "results/logs/bbduk/{sample}.bbduk.log"
priority:95
shell: ....
Step 3 - fastqc_after
rule fastqc_after:
input: "results/bbduk/{sample}_trimmed.fastq"
output: "results/bbduk/{sample}_trimmed_fastqc.zip"
log: "results/logs/fq_after/{sample}_trimmed.fastqc.log"
priority: 70
params: ...
shell: ...
Step 4 - star_align
rule star_align:
input: R1 = "results/bbduk/{sample}_trimmed.fastq"
output:
out_1 = "results/bam/{sample}_Aligned.toTranscriptome.out.bam",
out_2 = "results/bam/{sample}_ReadsPerGene.out.tab"
params: ...
log: "results/logs/star/{sample}.star.log"
priority:90
shell: ...
Step 5 - rsem_norm
rule rsem_norm:
input:
bam = "results/bam/{sample}_Aligned.toTranscriptome.out.bam"
output:
genes = "results/quant/{sample}.genes.results"
params: ...
threads = 16
priority:85
shell: ...
Step 6 - rsem_model
rule rsem_model:
input: "results/quant/{sample}.genes.results"
output: "results/quant/{sample}_diagnostic.pdf"
params: ...
shell: ...
Step 7 - tximport_rsem
rule tximport_rsem:
input: expand("results/quant/{sample}_diagnostic.pdf",sample=samples)
output: "results/rsem_tximport/RSEM_GeneLevel_Summarization.csv"
shell: ...
Step 8 - multiqc
rule multiqc:
input: expand("results/quant/{sample}.genes.results",sample=samples)
output: "results/multiqc/project_QS_STAR_RSEM_trial.html"
log: "results/log/multiqc"
shell: ...
Upvotes: 2
Views: 601
Reputation: 8184
If you want rule multiqc
to happen only after fastqc
completed, you can add the output of fastqc
to the input of multiqc
:
rule multiqc:
input:
expand("results/quant/{sample}.genes.results",sample=samples),
expand("results/fastqc/{sample}_fastqc.zip", sample=samples)
output: "results/multiqc/project_QS_STAR_RSEM_trial.html"
log: "results/log/multiqc"
shell: ...
Or, if you need to be able to refer to the output of rsem_norm
in your shell
section:
rule multiqc:
input:
rsem_out = expand("results/quant/{sample}.genes.results",sample=samples),
fastqc_out = expand("results/fastqc/{sample}_fastqc.zip", sample=samples)
output: "results/multiqc/project_QS_STAR_RSEM_trial.html"
log: "results/log/multiqc"
shell: "... {input.rsem_out} ..."
In one of your comments, you wrote:
MultiQC needs directory as input - I give it the 'results' directory in my shell command.
If I understand correctly, this means that results/quant/{sample}.genes.results
are directories, and not plain files. If this is the case, you should make sure no downstream rule writes files inside those directories. Otherwise, the directories will be considered as having been updated after the output of multiqc
, and multiqc
will be re-run every time you run the pipeline.
Upvotes: 1