user3224522
user3224522

Reputation: 1149

Snakemake using the first argument of a list as a wildcard

I am trying to run the analysis in snakemake where as a proband I take always the first bam file present in the list, i.e NUM_194 and NUM_123. Is there a way to use as a wildcards the first IDENTIFIER of the bam file of the list(d) in the proband line?

d = {"FAM_194": ["path/to/NUM_194/NUM_194.bam", "path/to/NUM_195/NUM_195.bam", "path/to/NUM_196/NUM_196.bam"],
     "FAM_123": ["path/to/NUM_123/NUM_123.bam", "path/to/NUM_126/NUM_126.bam", "path/to/NUM_127/NUM_127.bam"]}
     
FAMILIES = list(d)

rule all:
    input:
        ...
        
wildcard_constraints:
    family = "|".join(FAMILIES)

.....some other rules 

rule SelectVariants:
    input:
        invcf="{fam}/{fam}.vcf"
    params:
        ref="myref.fasta"
    output:
        out="{fam}/{fam}.proband.vcf",
        out2="{fam}/{fam}.p.avinput"
    shell:
        """
        proband=NUM_194 <--- the first sample of the list(d), for example NUM_194
        gatk --java-options "-Xms2G -Xmx2g -XX:ParallelGCThreads=2" SelectVariants -R {params.ref} -V {input.invcf} -sn "$proband" -O {output.out}
        convert2annovar -format vcf4 --includeinfo {output.out} > {output.out2}
        """
     

Upvotes: 2

Views: 89

Answers (1)

dariober
dariober

Reputation: 9062

Maybe using a function as input (lambda function here) like this?

rule SelectVariants:
    input:
        invcf="{fam}/{fam}.vcf",
        proband= lambda wc: d[wc.fam][0],
    ...
    shell:
        """
        gatk ... -sn {input.proband} ...
        """

Upvotes: 3

Related Questions