Reputation: 1149
I am trying to run the analysis in snakemake where as a proband I take always the first bam file present in the list, i.e NUM_194
and NUM_123
. Is there a way to use as a wildcards the first IDENTIFIER of the bam file of the list(d)
in the proband line?
d = {"FAM_194": ["path/to/NUM_194/NUM_194.bam", "path/to/NUM_195/NUM_195.bam", "path/to/NUM_196/NUM_196.bam"],
"FAM_123": ["path/to/NUM_123/NUM_123.bam", "path/to/NUM_126/NUM_126.bam", "path/to/NUM_127/NUM_127.bam"]}
FAMILIES = list(d)
rule all:
input:
...
wildcard_constraints:
family = "|".join(FAMILIES)
.....some other rules
rule SelectVariants:
input:
invcf="{fam}/{fam}.vcf"
params:
ref="myref.fasta"
output:
out="{fam}/{fam}.proband.vcf",
out2="{fam}/{fam}.p.avinput"
shell:
"""
proband=NUM_194 <--- the first sample of the list(d), for example NUM_194
gatk --java-options "-Xms2G -Xmx2g -XX:ParallelGCThreads=2" SelectVariants -R {params.ref} -V {input.invcf} -sn "$proband" -O {output.out}
convert2annovar -format vcf4 --includeinfo {output.out} > {output.out2}
"""
Upvotes: 2
Views: 89
Reputation: 9062
Maybe using a function as input (lambda function here) like this?
rule SelectVariants:
input:
invcf="{fam}/{fam}.vcf",
proband= lambda wc: d[wc.fam][0],
...
shell:
"""
gatk ... -sn {input.proband} ...
"""
Upvotes: 3