user3224522
user3224522

Reputation: 1149

Snakemake integrate the multiple command lines in a rule

The output of my first command line "bcftools query -l {input.invcf} | head -n 1" prints the name of the first individual of vcf file (i.e. IND1). I want to use that output in selectvariants GATK in -sn IND1 option. How is it possible to integrate the 1st comamnd line in snakemake in order to use it's output in the next one?

rule selectvar:
    input:
        invcf="{family}_my.vcf"
    params:
        ind= ???
        ref="ref.fasta"
    output:
        out="{family}.dn.vcf"
    shell:
        """
        bcftools query -l {input.invcf} | head -n 1 > {params.ind}
        gatk --java-options "-Xms2G -Xmx2g -XX:ParallelGCThreads=2" SelectVariants -R {params.ref} -V {input.invcf} -sn {params.ind} -O {output.out}
        """

Upvotes: 3

Views: 693

Answers (3)

Daniel
Daniel

Reputation: 12016

When I have to do these things I prefer to use run instead of shell, and then shell out only at the end. The reason for this is because it makes it possible for snakemake to lint the run statement, and to exit early if something goes wrong instead of following through with a broken shell command.

rule selectvar:
    input:
        invcf="{family}_my.vcf"
    params:
        ref="ref.fasta"
        gatk_opts='--java-options "-Xms2G -Xmx2g -XX:ParallelGCThreads=2" SelectVariants'
    output:
        out="{family}.dn.vcf"
    run:
        opts = "{params.gatk_opts} -R {params.ref} -V {input.invcf} -O {output.out}"
        sn_parameter = shell("bcftools query -l {input.invcf} | head -n 1")
        # we could add a sanity check here if necessary, before shelling out
        shell(f"gatk {options} {sn_parameter}")
        """

Upvotes: 2

SultanOrazbayev
SultanOrazbayev

Reputation: 16551

There are several options, but the easiest one is to store the results into a temporary bash variable:

rule selectvar:
   ...
   shell:
        """
        myparam=$(bcftools query -l {input.invcf} | head -n 1)
        gatk -sn "$myparam" ...
        """

As noted by @dariober, if one modifies pipefail behaviour, there could be unexpected results, see the example in this answer.

Upvotes: 3

user3224522
user3224522

Reputation: 1149

I think I found a solution:

rule selectvar:
    input:
        invcf="{family}_my.vcf"
    params:
        ref="ref.fasta"
    output:
        out="{family}.dn.vcf"
    shell:
        """
        gatk --java-options "-Xms2G -Xmx2g -XX:ParallelGCThreads=2" SelectVariants -R {params.ref} -V {input.invcf} -sn `bcftools query -l {input.invcf} | head -n 1` -O {output.out}
        """

Upvotes: 1

Related Questions