Daniel Standage
Daniel Standage

Reputation: 8304

Access nested config variables in a rule

I am new to Snakemake and trying to figure out how/if nested configuration values work. I've created the following config file...

# dummyconfig.json
{
    "fam1": {
        "numchr": 1,
        "chrlen": 2500000,
        "seeds": {
            "genome": 8013785666,
            "simtrio": 1776,
            "simseq": {
                "mother": 2053695854357871005,
                "father": 4517457392071889495,
                "proband": 2574020394472462046
            }
        },
        "ninherited": 100,
        "ndenovo": 5,
        "numreads": 375000
    }
}

...to go along with this rule (among others) in my Snakefile.

# Snakefile
rule simgenome:
    input:
        "human.order6.mm",
    output:
        "{family}-refr.fa.gz"
    shell:
        "nuclmm simulate --out - --order 6 --numseqs {config[wildcards.family][numchr]} --seqlen {config[wildcards.family][chrlen]} --seed {config[wildcards.family][seeds][genome]} {input} | gzip -c > {output}"

I would then like to create fam1-refr.fa.gz by invoking snakemake --configfile dummyconfig.json fam1-refr.fa.gz. When I do so, I get the following error message.

Building DAG of jobs...

rule simgenome:
    input: human.order6.mm
    output: fam1-refr.fa.gz
    jobid: 0
    wildcards: family=fam1

RuleException in line 1 of /Users/standage/Projects/noble/Snakefile:
NameError: The name 'wildcards.family' is unknown in this context. Please make sure that you defined that variable. Also note that braces not used for variable access have to be escaped by repeating them, i.e. {{print $1}}

So fam1 is being correctly recognized as the value of the family wildcard, but it doesn't appear that variable accesses like {config[wildcards.family][numchr]} work.

Is it possible to traverse a nested configuration in this manner, or does Snakemake only support access of top-level variables?

Upvotes: 1

Views: 640

Answers (1)

Luiz Irber
Luiz Irber

Reputation: 61

One way to solve it is by using params and resolving the variable outside the shell block.

rule simgenome:
    input:
        "human.order6.mm",
    output:
        "{family}-refr.fa.gz"
    params:
        seed=lambda w: config[w.family]['seeds']['genome'],
        numseqs=lambda w: config[w.family]['numchr'],
        seqlen=lambda w: config[w.family]['chrlen']
    shell:
        "nuclmm simulate --out - --order 6 --numseqs {params.numseqs} --seqlen {params.seqlen} --seed {params.seed} {input} | gzip -c > {output}"

Upvotes: 1

Related Questions