Reputation: 2080
I need to run a snakemake rule in the cluster, therefore for some rules, I need some tools and library needed to e loaded whereas, these tools are independent/ exclusive to other rules. I this case how can I specify these in my snakemake rule. For example, for rule score
I need to module load r/3.5.1
and export R_lib =/user/tools/software
currently, I am running these lines separately in the command line before running snakemake. But it would be great if there is a way to do it within the rule as env
.
I have a rule as following,
rule score:
input:
count=os.path.join(config['general']['paths']['outdir'], 'count_expression', '{sample}.tsv'),
libsize=os.path.join(config['general']['paths']['outdir'], 'count_expression', '{sample}.size_tsv')
params:
result_dir=os.path.join(config['general']['paths']['outdir'], 'score'),
cancertype=config['general']['paths']['cancertype'],
sample_id=expand('{sample}',sample=samples['sample'].unique())
output:
files=os.path.join(config['general']['paths']['outdir'], 'score', '{sample}_bg_scores.tsv', '{sample}_tp_scores.tsv')
shell:
'mkdir -p {params.result_dir};Rscript {config[general][paths][tool]} {params.result_dir} {params.cancertype} {params.sample_id} {input.count} {input.libsize}'
My actual behavior for the above code snippet is:
shell:
mkdir -p /cluster/user/snakemake_test/results_april30/score;Rscript /cluster/home/user/Projects/R_scripts/scoretool.R /cluster/user/snakemake_test/results_april30/score DMC GNMS4 MRT5T /cluster/projects/test/results/exp/MRT5T.tsv /cluster/projects/test/results/Exp/MRT5T.size.tsv
Whereas, the expected behavior is:
shell:
mkdir -p /cluster/user/snakemake_test/results_april30/score;Rscript /cluster/home/user/Projects/R_scripts/scoretool.R /cluster/user/snakemake_test/results_april30/score DMC MRT5T /cluster/projects/test/results/exp/MRT5T.tsv /cluster/projects/test/results/Exp/MRT5T.size.tsv
and for the second sample,
shell:
mkdir -p /cluster/user/snakemake_test/results_april30/score;Rscript /cluster/home/user/Projects/R_scripts/scoretool.R /cluster/user/snakemake_test/results_april30/score DMC GNMS4 /cluster/projects/test/results/exp/GNMS4.tsv /cluster/projects/test/results/Exp/GNMS4.ize.tsv
I need the variable sample_d
['GNMS4', 'MRT5T']
should be taken separately, not together in one shell command line.
Upvotes: 1
Views: 286
Reputation: 8184
Regarding your first question: You can put whatever module load
or export
commands you like in the shell
section of a rule.
Regarding your second question, you should probably not use expand
in the params
section of your rule. In expand('{sample}',sample=samples['sample'].unique())
you are actually not using the value of the sample
wildcard, but generating a list of all unique values in sample['sample']
. You probably just need to use wildcards.sample
in the definition of your shell command instead of using a params
element.
If you want to run several instances of the score
rule based on possible values for sample
, you need to "drive" this using another rule that wants the output of score
as its input.
Note that to improve readability, you can use python's multi-line strings (triple-quoted).
To sum up, you might try something like this:
rule all:
input:
expand(
os.path.join(
config['general']['paths']['outdir'],
'score',
'{sample}_bg_scores.tsv',
'{sample}_tp_scores.tsv'),
sample=samples['sample'].unique())
rule score:
input:
count = os.path.join(
config['general']['paths']['outdir'],
'count_expression', '{sample}.tsv'),
libsize = os.path.join(
config['general']['paths']['outdir'],
'count_expression', '{sample}.size_tsv')
params:
result_dir = os.path.join(config['general']['paths']['outdir'], 'score'),
cancertype = config['general']['paths']['cancertype'],
output:
files = os.path.join(
config['general']['paths']['outdir'],
'score', '{sample}_bg_scores.tsv', '{sample}_tp_scores.tsv')
shell:
"""
module load r/3.5.1
export R_lib =/user/tools/software
mkdir -p {params.result_dir}
Rscript {config[general][paths][tool]} {params.result_dir} {params.cancertype} {wildcards.sample} {input.count} {input.libsize}
"""
Upvotes: 1
Reputation: 4089
onstart
would work I think. Note that dryruns don't trigger this handler, which is acceptable in your scenario.onstart:
shell("load tools")
for
loop should solve the problem. However, if you want each sample to be run as a separate rule, you would have to use sample name as part of output
filename.shell:
'''
for sample in {param.sample_id}
do
your command $sample
done
'''
Upvotes: 0