Reputation: 95
I am mapping and counting sequences that are mixed from two species. I have two rules in my pipeline because each specie has it's own GTF file. I would like to be able to merge this into one rule, and then have the specie wildcard dictate which GTF file to use (using a dictionary for example). These are the rules:
rule count_reads_human:
input:
ibam = "sorted_reads/human/{spc}_sorted.bam",
gtf = "/nadata/users/username/annotations/hg38/gencode.v38.annotation.gtf"
output:
bam = "counted_reads/human/{spc}_countBAM.bam",
htout = "counted_reads/human/{spc}_htseq_count.out"
log:
"logs/count_reads/human/{spc}.log"
shell:
"htseq-count -o {output.bam} -p bam {input.ibam} {input.gtf} > {output.htout} 2> {log}"
rule count_reads_mouse:
input:
ibam = "sorted_reads/mouse/{spc}_sorted.bam",
gtf = "/nadata/users/username/annotations/mm10/gencode.vM20.annotation.gtf"
output:
bam = "counted_reads/mouse/{spc}_countBAM.bam",
htout = "counted_reads/mouse/{spc}_htseq_count.out"
log:
"logs/count_reads/mouse/{spc}.log"
shell:
"htseq-count -o {output.bam} -p bam {input.ibam} {input.gtf} > {output.htout} 2> {log}"
What I would like to do is something of the sort:
gtfDict = {"human": "path/to/human/gtf/file.gtf", "mouse": "/path/to/mouse/gtf/file.gtf"}
rule count_reads:
input:
ibam = "sorted_reads/{specie}/{spc}_sorted.bam",
gtf = gtfDict[{specie}]
output:
bam = "counted_reads/{specie}/{spc}_countBAM.bam",
htout = "counted_reads/{specie}/{spc}_htseq_count.out"
log:
"logs/count_reads/{specie}/{spc}.log"
shell:
"htseq-count -o {output.bam} -p bam {input.ibam} {input.gtf} > {output.htout} 2> {log}"
My merged rule however does not work.
Upvotes: 1
Views: 42
Reputation: 95
Figured it out. Need to use lambda:
gtfDict = {"human": "/path/to/human.gtf",
"mouse": "/path/to/mouse.gtf"}
rule count_reads:
input:
ibam = "sorted_reads/{specie}/{spc}_sorted.bam",
gtf = lambda wildcards: gtfDict[wildcards.specie]
output:
bam = "counted_reads/{specie}/{spc}_countBAM.bam",
htout = "counted_reads/{specie}/{spc}_htseq_count.out"
log:
"logs/count_reads/{specie}/{spc}.log"
shell:
"htseq-count -o {output.bam} -p bam {input.ibam} {input.gtf} > {output.htout} 2> {log}"
Upvotes: 1