pabster212
pabster212

Reputation: 117

Snakemake issues running rules on two different subsets of wildcards

I am attempting to write a snakemake rule that will treat some files, based off their grouping differently than others. My file list is loaded in in a sample.tsv file.

I thought this would be relatively easy as I believed that populating the wildcards in the rule all would trigger execution of particular rules, but that does not appear to be the case.

Here is a paired down version of what I'm working on

List of sample files. Note that the chip category here is what becomes important for defining my groups

tissue replicate chip file
leaf rep2 input 00.data/chip_seq/input/leaf_input_rep2.fastq
leaf rep1 input 00.data/chip_seq/input/leaf_input_rep1.fastq
leaf rep2 H3K36me3 00.data/chip_seq/H3K36me3/leaf_H3K36me3_rep2.fastq
leaf rep1 H3K36me3 00.data/chip_seq/H3K36me3/leaf_H3K36me3_rep1.fastq
leaf rep1 H3K56ac 00.data/chip_seq/H3K56ac/leaf_H3K56ac_rep1.fastq                                                                                                        
leaf rep2 H3K56ac 00.data/chip_seq/H3K56ac/leaf_H3K56ac_rep2.fastq

In my script I have then divided these into two sub-categories

broad = ['H3K36me3']
narrow = ["H3K56ac"]
rule all:
    input:

#Align all reads
expand("02.unique_align/{tissue}_{chip}_{replicate}_unique_bowtie2_algn.bam", \
    ¦   ¦   ¦   tissue = samples['tissue'], replicate = samples['replicate'], \
    ¦   ¦   ¦   chip = samples['chip']),

#Should cause the expand on ONLY narrow groups, causing the below rule
# run_bcp_peak_caller_narrow to trigger 
expand("03.called_peaks/{tissue}_{replicate}_{chip}_peaks_region_narrow.bed",
    ¦   ¦   ¦   tissue = narrow_peaks['tissue'],
    ¦   ¦   ¦   replicate = narrow_peaks['replicate'],
    ¦   ¦   ¦   chip = narrow),

#Should cause the expand on ONLY narrow groups, causing the below rule
# run_bcp_peak_caller_broad to trigger

expand("03.called_peaks/{tissue}_{replicate}_{chip}_peaks_region_broad.bed",
    ¦   ¦   ¦   tissue = samples['tissue'],
    ¦   ¦   ¦   replicate = samples['replicate'],
    ¦   ¦   ¦   chip = broad)


## Two functions, one to get the input files, defined here as `get_input` the other to retrieve the chip files

def get_input(wildcards):
    z = glob.glob(os.path.join("02.unique_align/", (wildcards.tissue + "_" + \
    ¦   wildcards.replicate + "_" + "input_unique_bowtie2_algn.bam")))
    return z

def get_chip(wildcards):
    z = glob.glob(os.path.join("02.unique_align/", (wildcards.tissue + "_" + \
    ¦   ¦   ¦  wildcards.replicate + "_" + wildcards.chip + "_" + \
    ¦   ¦   ¦  "unique_bowtie2_algn.bam")))

    return z

rule run_bcp_peak_caller_broad:
    input:
    ¦   chip_input = get_input,
    ¦   chip_mod = get_chip
    params:
    ¦   "03.called_peaks/{tissue}_{replicate}_{chip}_peaks_broad"
    output:
    ¦   "03.called_peaks/{tissue}_{replicate}_{chip}_peaks_broad.bed"
    shell:"""
    peakranger bcp \
    --format bam \
    --verbose \
    --pval .001 \
    --data {input.chip_mod} \
    --control {input.chip_input} \
    --output {params}
    """

rule run_bcp_peak_caller_narrow:
    input:
        chip_input = get_input,
        chip_mod = get_chip
    params:
        "03.called_peaks/{tissue}_{replicate}_{chip}_peaks_narrow"
    output:
        "03.called_peaks/{tissue}_{replicate}_{chip}_peaks_narrow.bed"
    shell:"""
    peakranger \
    --format bam \
    --verbose \
    --pval .001 \
    --data {input.chip_mod} \
    --control {input.chip_input} \
    --output {params}
    """

Error is as follows:

MissingInputException in line 39 of /scratch/jpm73279/04.lncRNA/02.Analysis/24.regenerate_expression_peaks/Generate_peak_lists.snake:
Missing input files for rule all:
03.called_peaks/root_rep1_H3K4me1_peaks_region_broad.bed
03.called_peaks/root_rep2_H3K36me3_peaks_region_broad.bed
03.called_peaks/leaf_rep1_H3K4me1_peaks_region_broad.bed
03.called_peaks/root_rep1_H3K36me3_peaks_region_broad.bed
03.called_peaks/leaf_rep2_H3K36me3_peaks_region_broad.bed
03.called_peaks/root_rep2_H3K4me1_peaks_region_broad.bed

My understanding is that snakemake populates the files combinations found in the rule all section and then identify what steps need to be run upstream.

Any help would be greatly appreciated

Upvotes: 1

Views: 112

Answers (1)

Maarten-vd-Sande
Maarten-vd-Sande

Reputation: 3691

You are right in your understanding; when no output is specified to snakemake it will find the first rule, and try to generate its output.

The problem is that rule all specifies rules that can not be 'made'. I'll show a side-by-side comparison of the mistake:

rule all:
    03.called_peaks/root_rep1_H3K4me1_peaks_region_broad.bed

rule run_bcp_peak_caller_broad:
    output:
        "03.called_peaks/{tissue}_{replicate}_{chip}_peaks_broad.bed"

See the difference? The files you say you want to generate end with peaks_region_broad.bed, however your rules make output ending with peaks_broad.bed.

Take a look again at rule all and probably you want to remove the _region part of the strings.

Upvotes: 1

Related Questions