user3224522
user3224522

Reputation: 1151

Snakemake first genotype of a vcf file as wildcard in output

In the second rule I would like to select from the vcf file containing bob, clara and tim, only the first genotype of dictionary (i.e. bob) in roder to get as output in the second rule bob.dn.vcf. Is this possible in snakemake?

    d = {"FAM1": ["bob.bam", "clara.bam", "tim.bam"]}
    FAMILIES = list(d)
    
    rule all:
        input:
            expand some outputs
            
    wildcard_constraints:
        family = "|".join(FAMILIES)
    
    rule somerulename:
        input:
            lambda w: d[w.family]
        output:
            vcf="{family}/{family}.vcf"
        shell:
            """
            some python command line which produces a single vcf file with bob, clara and tim
            """
    
    rule somerulename:
        input:
            invcf="{family}/{family}.vcf"
        params:
            ref="someref.fasta"
        output:
            out="{family}/{bob}.dn.vcf"
        shell:
            """
            gatk --java-options "-Xms2G -Xmx2g -XX:ParallelGCThreads=2" SelectVariants -R {params.ref} -V {input.invcf} -O {output.out}
            """

Upvotes: 0

Views: 56

Answers (1)

SultanOrazbayev
SultanOrazbayev

Reputation: 16561

There are at least two options:

  1. explicitly specify output:
rule somerulename:
    output:
        out="FAM1/bob.dn.vcf"
  1. impose constraints on wildcard values:
rule somerulename:
    output:
        out="{family}/{bob}.dn.vcf"
    wildcard_constraints:
        family="FAM1",
        bob="bob",
  1. control what is produced by specifying appropriate inputs to rule all:
rule all:
    input: "FAM1/bob.dn.vcf", "FAM2/alice.dn.vcf"

Upvotes: 1

Related Questions