Reputation: 117
I am attempting to write a snakemake rule that will treat some files, based off their grouping
differently than others. My file list is loaded in in a sample.tsv file.
I thought this would be relatively easy as I believed that populating the wildcards in the rule all
would trigger execution of particular rules, but that does not appear to be the case.
Here is a paired down version of what I'm working on
List of sample files. Note that the chip
category here is what becomes important for defining my groups
tissue replicate chip file
leaf rep2 input 00.data/chip_seq/input/leaf_input_rep2.fastq
leaf rep1 input 00.data/chip_seq/input/leaf_input_rep1.fastq
leaf rep2 H3K36me3 00.data/chip_seq/H3K36me3/leaf_H3K36me3_rep2.fastq
leaf rep1 H3K36me3 00.data/chip_seq/H3K36me3/leaf_H3K36me3_rep1.fastq
leaf rep1 H3K56ac 00.data/chip_seq/H3K56ac/leaf_H3K56ac_rep1.fastq
leaf rep2 H3K56ac 00.data/chip_seq/H3K56ac/leaf_H3K56ac_rep2.fastq
In my script I have then divided these into two sub-categories
broad = ['H3K36me3']
narrow = ["H3K56ac"]
rule all:
input:
#Align all reads
expand("02.unique_align/{tissue}_{chip}_{replicate}_unique_bowtie2_algn.bam", \
¦ ¦ ¦ tissue = samples['tissue'], replicate = samples['replicate'], \
¦ ¦ ¦ chip = samples['chip']),
#Should cause the expand on ONLY narrow groups, causing the below rule
# run_bcp_peak_caller_narrow to trigger
expand("03.called_peaks/{tissue}_{replicate}_{chip}_peaks_region_narrow.bed",
¦ ¦ ¦ tissue = narrow_peaks['tissue'],
¦ ¦ ¦ replicate = narrow_peaks['replicate'],
¦ ¦ ¦ chip = narrow),
#Should cause the expand on ONLY narrow groups, causing the below rule
# run_bcp_peak_caller_broad to trigger
expand("03.called_peaks/{tissue}_{replicate}_{chip}_peaks_region_broad.bed",
¦ ¦ ¦ tissue = samples['tissue'],
¦ ¦ ¦ replicate = samples['replicate'],
¦ ¦ ¦ chip = broad)
## Two functions, one to get the input files, defined here as `get_input` the other to retrieve the chip files
def get_input(wildcards):
z = glob.glob(os.path.join("02.unique_align/", (wildcards.tissue + "_" + \
¦ wildcards.replicate + "_" + "input_unique_bowtie2_algn.bam")))
return z
def get_chip(wildcards):
z = glob.glob(os.path.join("02.unique_align/", (wildcards.tissue + "_" + \
¦ ¦ ¦ wildcards.replicate + "_" + wildcards.chip + "_" + \
¦ ¦ ¦ "unique_bowtie2_algn.bam")))
return z
rule run_bcp_peak_caller_broad:
input:
¦ chip_input = get_input,
¦ chip_mod = get_chip
params:
¦ "03.called_peaks/{tissue}_{replicate}_{chip}_peaks_broad"
output:
¦ "03.called_peaks/{tissue}_{replicate}_{chip}_peaks_broad.bed"
shell:"""
peakranger bcp \
--format bam \
--verbose \
--pval .001 \
--data {input.chip_mod} \
--control {input.chip_input} \
--output {params}
"""
rule run_bcp_peak_caller_narrow:
input:
chip_input = get_input,
chip_mod = get_chip
params:
"03.called_peaks/{tissue}_{replicate}_{chip}_peaks_narrow"
output:
"03.called_peaks/{tissue}_{replicate}_{chip}_peaks_narrow.bed"
shell:"""
peakranger \
--format bam \
--verbose \
--pval .001 \
--data {input.chip_mod} \
--control {input.chip_input} \
--output {params}
"""
Error is as follows:
MissingInputException in line 39 of /scratch/jpm73279/04.lncRNA/02.Analysis/24.regenerate_expression_peaks/Generate_peak_lists.snake:
Missing input files for rule all:
03.called_peaks/root_rep1_H3K4me1_peaks_region_broad.bed
03.called_peaks/root_rep2_H3K36me3_peaks_region_broad.bed
03.called_peaks/leaf_rep1_H3K4me1_peaks_region_broad.bed
03.called_peaks/root_rep1_H3K36me3_peaks_region_broad.bed
03.called_peaks/leaf_rep2_H3K36me3_peaks_region_broad.bed
03.called_peaks/root_rep2_H3K4me1_peaks_region_broad.bed
My understanding is that snakemake populates the files combinations found in the rule all
section and then identify what steps need to be run upstream.
Any help would be greatly appreciated
Upvotes: 1
Views: 112
Reputation: 3691
You are right in your understanding; when no output is specified to snakemake it will find the first rule, and try to generate its output.
The problem is that rule all
specifies rules that can not be 'made'. I'll show a side-by-side comparison of the mistake:
rule all:
03.called_peaks/root_rep1_H3K4me1_peaks_region_broad.bed
rule run_bcp_peak_caller_broad:
output:
"03.called_peaks/{tissue}_{replicate}_{chip}_peaks_broad.bed"
See the difference? The files you say you want to generate end with peaks_region_broad.bed
, however your rules make output ending with peaks_broad.bed
.
Take a look again at rule all
and probably you want to remove the _region part of the strings.
Upvotes: 1