Reputation: 11
I wrote some R scripts, and I 'd like to use snakemake
to integrate them to an analysis pipeline. I almost finish this pipeline, except one of the R script. In this R script, one of the parameters is a list, like this:
group=list(A=c("a","b","c"),B=c("d","e"),C=c("f","g","h"))
I don't know how to call this kind of parameters in snakemake
.
The R script and snakemake
script I wrote are as follow:
library(optparse)
library(ggtree)
library(ggplot2)
library(colorspace)
# help doc
option_list=list(
make_option("--input",type="character",help="<file> input file"),
make_option("--output",type="character",help="<file> output file"),
make_option("--families",type="character",help="<list> a list containing classified families"),
make_option("--wide",type="numeric",help="<num> width of figure"),
make_option("--high",type="numeric",help="<num> height of figure"),
make_option("--labsize",type="numeric",help="<num> size of tip lable")
)
opt_parser=OptionParser(usage="\n\nName: cluster_vil.r",
description="\nDescription: This script is to virualize the result of cluster analysis.
\nContact: huisu<[email protected]>
\nDate: 9.5.2019",
option_list=option_list,
epilogue="Example: Rscript cluster_vil.r --input mega_IBSout_male.nwk
--output NJ_IBS_male.ggtree.pdf
--families list(Family_A=c('3005','3021','3009','3119'),Family_B=c('W','4023'),Family_C=c('810','3003'),Family_D=c('4019','1001','4015','4021'),Family_E=c('4017','3115'))
--wide 18
--high 8
--labsize 7"
)
opt=parse_args(opt_parser)
input=opt$input
output=opt$output
families=opt$families
wide=opt$wide
high=opt$high
labsize=opt$labsize
# start plot
nwk=read.tree(input)
tree=groupOTU(nwk, families)
pdf(file=output,width=wide,height=high) # 18,8 for male samples; 12,18 for all samples
ggtree(tree,aes(color=group),branch.length='none') + geom_tiplab(size=labsize) +
theme(legend.position=("left"),legend.text=element_text(size=12),legend.title=element_text(size=18),
legend.key.width=unit(0.5,"inches"),legend.key.height=unit(0.3,"inches")) +
scale_color_manual(values=c("black", rainbow_hcl(length(families)))) +
theme(plot.margin=unit(rep(2,4),'cm'))
dev.off()
rule cluster_virual:
input:
nwk="mega_IBS.nwk",
output:
all="mega_IBS.pdf",
params:
fam=collections.OrderedDict([('Family_A',['3005','3021','3009','3119']),
('Family_B',['W','4023']),
('Family_C',['810','3003']),
('Family_D',["4019","1001","4015","4021"]),
('Family_E',["4017","3115"])])
message:
"====cluster analysis virualization===="
shell:
"Rscript Rfunction/cluster_vil.r "
"--input {input.nwk} "
"--output {output.all} "
"--families {params.fam} "
"--wide 12 "
"--high 18 "
"--labsize 3"
So, I want to know how to properly call the write the parameter fam
in snakemake.
Upvotes: 1
Views: 323
Reputation: 9062
I think in python/snakemake you can use OrderedDict
to represent an R list. So:
params:
fam=list(A=c('a','b','c'),B=c('d','e'),C=c('f','g','h'))
Would be:
params:
fam= collections.OrderedDict([('A', ['a', 'b', 'c']),
('B', ['d', 'e', 'f']),
('C', ['h', 'g'])])
Of course, add import collections
to the top of your snakemake file (or wherever you want to import the collections module).
Upvotes: 2