Reputation: 1088
I keep getting an error about a rule not having the same wildcards in its output rules and I can't figure out what the source of the error might be:
SyntaxError:
Not all output, log and benchmark files of rule bcftools_filter contain the same wildcards. This is crucial though, in order to avoid that two or more jobs write to the same file.
...
rule merge_YRI_GTEx:
input:
kg=expand("kg_vcf/1kg_yri_chr{q}.vcf.gz", q=range(1,23)),
gtex=expand("gtex_vcf/gtex_chr{v}.snps.recode.vcf.gz", v=range(1, 23))
output:
"merged/merged_chr{i}.vcf.gz"
shell:
"bcftools merge \
-0 \
-O z \
-o {output} \
{input.kg} \
{input.gtex}"
rule bcftools_filter:
input:
expand("merged/merged_chr{i}.vcf.gz", i=range(1,23))
output:
filt="filtered_vcf/merged_filtered_chr{i}.vcf.gz",
chk=touch(".bcftools_filter.chkpnt")
threads:
4
shell:
"bcftools filter \
--include 'AN=1890 && AC > 0' \
--threads {threads} \
-O z \
-o {output.filt} \
{input}"
...
rule list_merged_filtered_vcfs:
input:
".bcftools_filter.chkpnt"
output:
"processed_vcf_list.txt"
shell:
"for i in {{1..22}}; do \ "
"echo \"{config[sprime_dir]}/filtered_vcf/merged_filtered_chr${{i}}.vcf.gz\" >> \
{output}; done"
The specific line it's complaining about is the one that's just "bcftools filter \
which is even more dumbfounding to me. I've tried giving names to the input wildcard and even scrutinizing the rule which calls bcftools_filter
's output
as well as the rule which produces bcftools_filter
's input
to no avail. Not sure what is giving me this error.
Upvotes: 1
Views: 507
Reputation: 9062
I think the error comes from chk=touch(".bcftools_filter.chkpnt")
not containing the wildcard {i}
.
Apart from that, I'm not sure you rule is very sensible. You are passing to bcftools filter
a list of input files (from expand(...)
) but I don't think bcftools filter accept more than one input file. Also, your rule will create output files filtered_vcf/merged_filtered_chr{i}.vcf.gz
(one for each value of i) using the same list of input files. Are you sure you want expand("merged/merged_chr{i}.vcf.gz", i=range(1,23))
instead of just "merged/merged_chr{i}.vcf.gz"
, with values for i given somewhere upstream?
Upvotes: 2