calling variables for rule individually and adding an independent environment for a specific rule

Question

I need to run a snakemake rule in the cluster, therefore for some rules, I need some tools and library needed to e loaded whereas, these tools are independent/ exclusive to other rules. I this case how can I specify these in my snakemake rule. For example, for rule score I need to module load r/3.5.1 and export R_lib =/user/tools/software currently, I am running these lines separately in the command line before running snakemake. But it would be great if there is a way to do it within the rule as env.

Question,

I have a rule as following,

rule score:
    input:
        count=os.path.join(config['general']['paths']['outdir'], 'count_expression', '{sample}.tsv'),
        libsize=os.path.join(config['general']['paths']['outdir'], 'count_expression', '{sample}.size_tsv')
    params:
        result_dir=os.path.join(config['general']['paths']['outdir'], 'score'),
        cancertype=config['general']['paths']['cancertype'],
        sample_id=expand('{sample}',sample=samples['sample'].unique())
    output:
        files=os.path.join(config['general']['paths']['outdir'], 'score', '{sample}_bg_scores.tsv', '{sample}_tp_scores.tsv')
    shell:
        'mkdir -p {params.result_dir};Rscript {config[general][paths][tool]} {params.result_dir} {params.cancertype} {params.sample_id} {input.count} {input.libsize}'

My actual behavior for the above code snippet is:

shell:
        mkdir -p /cluster/user/snakemake_test/results_april30/score;Rscript /cluster/home/user/Projects/R_scripts/scoretool.R /cluster/user/snakemake_test/results_april30/score DMC GNMS4 MRT5T /cluster/projects/test/results/exp/MRT5T.tsv /cluster/projects/test/results/Exp/MRT5T.size.tsv

Whereas, the expected behavior is:

shell:
        mkdir -p /cluster/user/snakemake_test/results_april30/score;Rscript /cluster/home/user/Projects/R_scripts/scoretool.R /cluster/user/snakemake_test/results_april30/score DMC MRT5T /cluster/projects/test/results/exp/MRT5T.tsv /cluster/projects/test/results/Exp/MRT5T.size.tsv

and for the second sample,

shell:
        mkdir -p /cluster/user/snakemake_test/results_april30/score;Rscript /cluster/home/user/Projects/R_scripts/scoretool.R /cluster/user/snakemake_test/results_april30/score DMC GNMS4 /cluster/projects/test/results/exp/GNMS4.tsv /cluster/projects/test/results/Exp/GNMS4.ize.tsv

I need the variable sample_d ['GNMS4', 'MRT5T'] should be taken separately, not together in one shell command line.

bli · Accepted Answer

Regarding your first question: You can put whatever module load or export commands you like in the shell section of a rule.

Regarding your second question, you should probably not use expand in the params section of your rule. In expand('{sample}',sample=samples['sample'].unique()) you are actually not using the value of the sample wildcard, but generating a list of all unique values in sample['sample']. You probably just need to use wildcards.sample in the definition of your shell command instead of using a params element.

If you want to run several instances of the score rule based on possible values for sample, you need to "drive" this using another rule that wants the output of score as its input.

Note that to improve readability, you can use python's multi-line strings (triple-quoted).

To sum up, you might try something like this:

rule all:
    input:
        expand(
            os.path.join(
                config['general']['paths']['outdir'],
                'score',
                '{sample}_bg_scores.tsv',
                '{sample}_tp_scores.tsv'),
            sample=samples['sample'].unique())

rule score:
    input:
        count = os.path.join(
             config['general']['paths']['outdir'],
             'count_expression', '{sample}.tsv'),
        libsize = os.path.join(
             config['general']['paths']['outdir'],
             'count_expression', '{sample}.size_tsv')
    params:
        result_dir = os.path.join(config['general']['paths']['outdir'], 'score'),
        cancertype = config['general']['paths']['cancertype'],
    output:
        files = os.path.join(
            config['general']['paths']['outdir'],
            'score', '{sample}_bg_scores.tsv', '{sample}_tp_scores.tsv')
    shell:
        """
        module load r/3.5.1
        export R_lib =/user/tools/software
        mkdir -p {params.result_dir}
        Rscript {config[general][paths][tool]} {params.result_dir} {params.cancertype} {wildcards.sample} {input.count} {input.libsize}
        """

calling variables for rule individually and adding an independent environment for a specific rule

Answers (2)

Related Questions