Reputation: 663
In snakemake, you can call external scripts like so:
rule NAME:
input:
"path/to/inputfile",
"path/to/other/inputfile"
output:
"path/to/outputfile",
"path/to/another/outputfile"
script:
"path/to/script.R"
This gives convenient access to an S4 object named snakemake
inside the R script.
Now in my case, I am running snakemake on a SLURM cluster, and I need to load R with module load R/3.6.0
before an Rscript can be executed, otherwise the job will return:
/usr/bin/bash: Rscript: command not found
How can I tell snakemake to do that? If I run the rule as a shell instead of a script, my R script unfortunately has no access to the snakemake
object, so this is no desired solution:
shell:
"module load R/3.6.0;"
"Rscript path/to/script.R"
Upvotes: 4
Views: 2348
Reputation: 346
maybe u are finding envmodules, which is a derective of snakemake that activate cluster module , just like module load;
rule your_rule:
input:
output:
envmodules:
"R/3.6.0"
shell:
"some Rscript"
Upvotes: 2
Reputation: 3368
You cannot call a shell command using the script
tag. You definitely have to use the shell
tag. You can always add your inputs and outputs as arguments:
rule NAME:
input:
in1="path/to/inputfile",
in2="path/to/other/inputfile"
output:
out1="path/to/outputfile",
out2="path/to/another/outputfile"
shell:
"""
module load R/3.6.0
Rscript path/to/script.R {input.in1} {input.in2} {output.out1} {output.out2}
"""
and get your arguments in the R script:
args=commandArgs(trailingOnly=TRUE)
inFile1=args[1]
inFile2=args[2]
outFile1=args[3]
outFile2=args[4]
Use of conda environment:
You can specify a conda environment to use for a specific rule:
rule NAME:
input:
in1="path/to/inputfile",
in2="path/to/other/inputfile"
output:
out1="path/to/outputfile",
out2="path/to/another/outputfile"
conda: "r.yml"
script:
"path/to/script.R"
and in you r.yml file:
name: rEnv
channels:
- r
dependencies:
- r-base=3.6
Then when you run snakemake:
snakemake .... --use-conda
Snakemake will install all environments prior to running and each environment will be activated inside the job sent to slurm.
Upvotes: 4
Reputation: 9062
If your concern is to call the arguments by name in the Rscript command, you could have something like this (basically an extension of Eric's answer):
rule NAME:
input:
in1="path/to/inputfile",
in2="path/to/other/inputfile"
output:
out1="path/to/outputfile",
out2="path/to/another/outputfile"
shell:
r"""
module load R/3.6.0
Rscript path/to/script.R \
inFile1={input.in1} inFile2={input.in2} \
outFile1={output.out1} outFile2={output.out2}
"""
Then inside script.R
you access each argument by parsing the command line:
args <- commandArgs(trailingOnly= TRUE)
for(x in args){
if(grepl('^inFile1=', x)){
inFile1 <- sub('^inFile1=', '', x)
}
else if(grepl('^inFile2=', x)){
inFile2 <- sub('^inFile2=', '', x)
}
else if(grepl('^outFile1=', x)){
outFile1 <- sub('^outFile1=', '', x)
}
else if(grepl('^outFile2=', x)){
outFile2 <- sub('^outFile2=', '', x)
}
else {
stop(sprintf('Unrecognized argument %s', x))
}
}
# Do stuff with inFile1, inFile2, etc...
Consider also some library designed for parsing the command line, myself I'm quite happy with argparse for R
Upvotes: 2