bgbrink
bgbrink

Reputation: 663

Snakemake - load cluster modules before an external script is called

In snakemake, you can call external scripts like so:

rule NAME:
    input:
        "path/to/inputfile",
        "path/to/other/inputfile"
    output:
        "path/to/outputfile",
        "path/to/another/outputfile"
    script:
        "path/to/script.R"

This gives convenient access to an S4 object named snakemake inside the R script. Now in my case, I am running snakemake on a SLURM cluster, and I need to load R with module load R/3.6.0 before an Rscript can be executed, otherwise the job will return:

/usr/bin/bash: Rscript: command not found

How can I tell snakemake to do that? If I run the rule as a shell instead of a script, my R script unfortunately has no access to the snakemake object, so this is no desired solution:

shell:
    "module load R/3.6.0;"
    "Rscript path/to/script.R"

Upvotes: 4

Views: 2348

Answers (3)

atongsa
atongsa

Reputation: 346

maybe u are finding envmodules, which is a derective of snakemake that activate cluster module , just like module load;

rule your_rule:
    input:
    output:
    envmodules:
        "R/3.6.0"
    shell:
        "some Rscript"

Upvotes: 2

Eric C.
Eric C.

Reputation: 3368

You cannot call a shell command using the script tag. You definitely have to use the shell tag. You can always add your inputs and outputs as arguments:

rule NAME:
    input:
        in1="path/to/inputfile",
        in2="path/to/other/inputfile"
    output:
        out1="path/to/outputfile",
        out2="path/to/another/outputfile"
    shell:
        """
        module load R/3.6.0
        Rscript path/to/script.R {input.in1} {input.in2} {output.out1} {output.out2}
        """

and get your arguments in the R script:

args=commandArgs(trailingOnly=TRUE)
inFile1=args[1]
inFile2=args[2]
outFile1=args[3]
outFile2=args[4]

Use of conda environment:

You can specify a conda environment to use for a specific rule:

rule NAME:
    input:
        in1="path/to/inputfile",
        in2="path/to/other/inputfile"
    output:
        out1="path/to/outputfile",
        out2="path/to/another/outputfile"
    conda: "r.yml"
    script:
        "path/to/script.R"

and in you r.yml file:

name: rEnv
channels:
  - r
dependencies:
  - r-base=3.6

Then when you run snakemake:

snakemake .... --use-conda

Snakemake will install all environments prior to running and each environment will be activated inside the job sent to slurm.

Upvotes: 4

dariober
dariober

Reputation: 9062

If your concern is to call the arguments by name in the Rscript command, you could have something like this (basically an extension of Eric's answer):

rule NAME:
    input:
        in1="path/to/inputfile",
        in2="path/to/other/inputfile"
    output:
        out1="path/to/outputfile",
        out2="path/to/another/outputfile"
    shell:
        r"""
        module load R/3.6.0
        Rscript path/to/script.R \
            inFile1={input.in1} inFile2={input.in2} \
            outFile1={output.out1} outFile2={output.out2}
        """

Then inside script.R you access each argument by parsing the command line:

args <- commandArgs(trailingOnly= TRUE)

for(x in args){
    if(grepl('^inFile1=', x)){
        inFile1 <- sub('^inFile1=', '', x)
    }
    else if(grepl('^inFile2=', x)){
        inFile2 <- sub('^inFile2=', '', x)
    }
    else if(grepl('^outFile1=', x)){
        outFile1 <- sub('^outFile1=', '', x)
    }
    else if(grepl('^outFile2=', x)){
        outFile2 <- sub('^outFile2=', '', x)
    }
    else {
        stop(sprintf('Unrecognized argument %s', x))
    }
}
# Do stuff with inFile1, inFile2, etc...

Consider also some library designed for parsing the command line, myself I'm quite happy with argparse for R

Upvotes: 2

Related Questions