Snakemake WorkflowError: Target rules may not contain wildcards

Question

rule all:
        input:
                "../data/A_checkm/{genome}"

rule A_checkm:
    input:
      "../data/genomesFna/{genome}_genomic.fna.gz"
    output:
        directory("../data/A_checkm/{genome}")
    threads:
        16
    resources:
        mem_mb = 40000
    shell:
        """
        # setup a tmp working dir
        tmp=$(mktemp -d)
        mkdir $tmp/ref
        cp {input} $tmp/ref/genome.fna.gz
        cd $tmp/ref
        gunzip -c genome.fna.gz > genome.fna
        cd $tmp

        # run checking
        checkm lineage_wf -t {threads} -x fna ref out > stdout

        # prepare output folder
        cd {config[project_root]}
        mkdir -p {output}
        # copy results over
        cp -r $tmp/out/* {output}/
        cp $tmp/stdout {output}/checkm.txt
        # cleanup
        rm -rf $tmp
        """

Thank you in advance for your help! I would like to run checkm on a list of ~600 downloaded genome files having the extension '.fna.gz'. Each downloaded file is saved in a separate folder having the same name as the genome. I would like also to have all the results in a separate folder for each genome and that's why my output is a directory. When I run this code with 'snakemake -s Snakefile --cores 10 A_checkm', I get the following error:

WorkflowError: Target rules may not contain wildcards. Please specify concrete files or a rule without wildcards at the command line, or have a rule without wildcards at the very top of your workflow (e.g. the typical "rule all" which just collects all results you want to generate in the end).

Anyone could help me identifying the error, please?

euronion · Accepted Answer

You need to provide snakemake with concrete values for the {genome} wildcard. You cannot just leave it open and expect snakemake to work on all the files in some folder of your project just like that.

Determine the filenames/genome values of the files which you want to work on, using glob_wildcards(...). See the documentation for further details.
Now you can use these values to specify in rule all to create all the folders (using your other rule) with those {genome} values:

# Determine the {genome} for all downloaded files
(GENOMES,) = glob_wildcards("../data/genomesFna/{genome}_genomic.fna.gz")


rule all:
    input:
        expand("../data/A_checkm/{genome}", genome=GENOMES),


rule A_checkm:
    input:
        "../data/genomesFna/{genome}_genomic.fna.gz",
    output:
        directory("../data/A_checkm/{genome}"),
    threads: 16
    resources:
        mem_mb=40000,
    shell:
        # Your magic goes here

If the download is supposed to happen inside snakemake, add a checkpoint for that. Have a look at this answer then.

Snakemake WorkflowError: Target rules may not contain wildcards

Answers (1)

Related Questions