Reputation: 11
I have a snakemake pipeline with rules that call other programs and custom R and python scripts.
I have multiple datasets on which this same pipeline needs to run. Usually I would make a separate folder for each dataset and put a config file specific to the dataset and run it individually.
As I have 20+ datasets this time, I was wondering if there is a more automated way to do this. There are mainly 4 parameters which change between the datasets: input file location, primer, quality control parameter and output dir for results. Is there a way to have a 'master' config file which would have information on these 4 parameters and a snakefile which then calls the second snakefile as many times as the number for datasets?
This whole idea seems like a for loop to me which loops through arrays of these 4 parameters but I can't figure out how to implement it in snakemake.
Any suggestions and ideas are welcome! Thanks Hena
Upvotes: 1
Views: 621
Reputation: 8194
Provided all the parameters are somewhat "encoded" in the output file names, I think this can be done using a single snakefile.
Your main configuration file would include a section for each dataset, and this section could contain the desired output directory as well as a path to a configuration file specific to this dataset.
Proof of concept:
Snakefile
:
import yaml
datasets = list(config.keys())
results = []
for dataset in datasets:
out_dir = config[dataset]["out_dir"]
with open(config[dataset]["conf"]) as conf_fh:
dat_conf = yaml.safe_load(conf_fh)
p1 = dat_conf["p1"]
p2 = dat_conf["p2"]
p3 = dat_conf["p3"]
p4 = dat_conf["p4"]
results.append(f"{out_dir}/{p1}_{p2}_{p3}_{p4}.out")
rule all:
input:
results
rule make_output:
output:
"{out_dir}/{p1}_{p2}_{p3}_{p4}.out"
shell:
"touch {output[0]}"
main_config.yaml
:
dat1:
out_dir: "dat1"
conf: "dat1_conf.yaml"
dat2:
out_dir: "dat2"
conf: "dat2_conf.yaml"
dat1_conf.yaml
:
p1: "A"
p2: "a"
p3: "1"
p4: "01"
dat2_conf.yaml
:
p1: "B"
p2: "b"
p3: "2"
p4: "02"
Can be executed, for instance, as follows:
snakemake --snakefile Snakefile --configfile main_config.yaml -j 2
This creates the following result files:
dat1/A_a_1_01.out
dat2/B_b_2_02.out
Upvotes: 1