Awa
Awa

Reputation: 1

wildcards and regular expression in snakemake

I'm trying to create wildcards with some folders/directories names that are ouputed from rule ReferenceDatabase that created the "Campylobacter/Gene_Flow/ReferenceDatabase/{dirname}" folder ( cluster1, cluster2, ... correspond to dirname wildcards) but I'm not able to know how many "cluster" directories will be created at the first time this rule is running. So I tried to write the Snakefile as below:

import glob

# Need sample name and dirname
SAMPLES, = glob_wildcards("Campylobacter/core_genome/core/{sample}.fa.align")

dirnames, = glob_wildcards("Campylobacter/Gene_Flow/ReferenceDatabase/{dirname}", "Campylobacter/Gene_Flow/DatabaseQuery/{dirname}/{dirname}")

wildcard_constraints:
    dirname="cluster[0-9]+"

rule all:
    input:
        distmat_out = "Campylobacter/ANI_results/ani/ani.distmat",
        parse_distances_out = "Campylobacter/ANI_results/genome_pairs.csv",
        cluster_genomes_out = "Campylobacter/ANI_results/cluster_genomes.csv",
        liste_genomes = expand("Campylobacter/Gene_Flow/ReferenceDatabase/{dirname}/path_to_genome_list.txt", dirname=dirnames),
        core_genome_within_species = expand("Campylobacter/Gene_Flow/ReferenceDatabase/{dirname}/core_genome/concat.fa", dirname=dirnames),
        distances_between_genomes_r = expand("Campylobacter/Gene_Flow/ReferenceDatabase/{dirname}/core_genome/distances.dist", dirname=dirnames)

rule define_ANI_species:
    input:
        fasta = "Campylobacter/core_genome/concat.fa",
        dir = "Campylobacter"
    output:
        distmat = "Campylobacter/ANI_results/ani/ani.distmat",
        parse_distances = "Campylobacter/ANI_results/genome_pairs.csv",
        cluster_genomes = "Campylobacter/ANI_results/cluster_genomes.csv",
    shell:
        """
        mkdir -p Campylobacter/ANI_results/ani
        distmat  -sequence {input.fasta} -nucmethod 0 -outfile {output.distmat}
        python pipelines/ANI/parse_distances.py {input.dir}
        python pipelines/ANI/cluster_genomes.py {input.dir}
        """

rule ReferenceDatabase:
    input:
        cluster_genomes = "Campylobacter/ANI_results/cluster_genomes.csv",
        dir = "Campylobacter"
    output:
        liste = "Campylobacter/Gene_Flow/ReferenceDatabase/{dirname}/path_to_genome_list.txt"
    shell:
        "python pipelines/ConSpecifix/create_Refdb.py {input.dir}"

rule core_genome_within_species:
    input:
        dir = "Campylobacter/genomes",
        liste = "Campylobacter/Gene_Flow/ReferenceDatabase/{dirname}/path_to_genome_list.txt"
    output:
        fasta = "Campylobacter/Gene_Flow/ReferenceDatabase/{dirname}/core_genome/concat.fa",
        family = "Campylobacter/Gene_Flow/ReferenceDatabase/{dirname}/core_genome/families_core.txt"
    params:
        dir = directory("Campylobacter/Gene_Flow/ReferenceDatabase/{dirname}/core_genome")
    shell:
        "python pipelines/CoreCruncher/corecruncher_master.py -in {input.dir} -out {params.dir} -list {input.liste} -freq 85 -prog usearch -ext .fa -length 80 -score 70 -align mafft"

I got this error:

rule ReferenceDatabase:
input: Campylobacter/ANI_results/genome_clusters.csv, Campylobacter
output: Campylobacter/Gene_Flow/ReferenceDatabase/cluster[0-9]+/path_to_genome_list.txt
jobid: 18
wildcards: dirname=cluster[0-9]+
Waiting at most 5 seconds for missing files.
MissingOutputException in line 171 of /Users/home//Bioinformatic_tool/Snakefile:
Job completed successfully, but some output files are missing. Missing files after 5 seconds:
Campylobacter/Gene_Flow/ReferenceDatabase/cluster[0-9]+/path_to_genome_list.txt
This might be due to filesystem latency. If that is the case, consider to increase the wait time with --latency-wait.

It seems that snakemake does not recognize the regex used "[0-9]+" Is there like a wildcard for an int that I can use to match: cluster1, cluster2 , cluster3 ...? (directory1, directory2, directory3 ...?)

Upvotes: 0

Views: 53

Answers (1)

Bohemian
Bohemian

Reputation: 425278

Only use a wildcard for the file number:

"Campylobacter/Gene_Flow/ReferenceDatabase/cluster{num}"

Upvotes: -1

Related Questions