Reputation: 1088
I've spent the past year working on a project executed on a SLURM-managed cluster. Now we want to make our results reproducible and to do so, we're porting it to snakemake. However, I am learning it from scratch and it's giving me a headache.
Below is my code:
# module load python/3.7
# python -m venv ./venv
# source ./venv/bin/activate
# pip install snakemake
# snakemake --version
configfile: "config.yaml"
#localrules:
vcf=config["vcf"],
ncbiFiles=config["ncbiFiles"]
LC=config["leafcutter"]
rule all:
input: ".prepare_phen_table.chkpnt"
rule filter_vcf:
input:
expand("{vcf}", vcf=config["vcf"]),
expand("{ncbiFiles}", ncbiFiles=config["ncbiFiles"])
output:
expand("{ncbiFiles}/phg000830.v1.GTEx_WGS.genotype-calls-vcf.c1/GTExWGSGenotypeMatrixBiallelicOnly.HQ.vcf.gz", ncbiFiles=config["ncbiFiles"])
shell:
expand("sbatch --wait --export=vcf={vcf},outdir=$PWD src/sqtl_mapping/primary/sh/00a_bcftools_filter.sh", vcf=config["vcf"])
rule index_vcf:
input:
expand("{ncbiFiles}/phg000830.v1.GTEx_WGS.genotype-calls-vcf.c1/GTExWGSGenotypeMatrixBiallelicOnly.HQ.vcf.gz", ncbiFiles=config["ncbiFiles"])
output:
expand("{ncbiFiles}/phg000830.v1.GTEx_WGS.genotype-calls-vcf.c1/GTExWGSGenotypeMatrixBiallelicOnly.vcf.gz.tbi",ncbiFiles=config["ncbiFiles"]),
expand("{ncbiFiles}/.index_vcf.chkpnt",ncbiFiles=config["ncbiFiles"])
shell:
expand("sbatch --export=outdir=$PWD src/sqtl_mapping/primary/sh/00b_index_vcf.sh;"
"touch {ncbiFiles}/.index_vcf.chkpnt",ncbiFiles=config["ncbiFiles"])
rule junc_cluster:
input:
expand("{ncbiFiles}/.index_vcf.chkpnt", ncbiFiles=config["ncbiFiles"])
output:
".junc_cluster.chkpnt"
shell:
"sbatch --wait src/sqtl_mapping/sh/01_junc_cluster.sh;"
"touch .junc_cluster.chkpnt"
rule intron_clustering:
input:
".junc_cluster.chkpnt"
output:
".intron_clustering.chkpnt"
shell:
expand("sbatch --wait src/sqtl_mapping/sh/02_intronclustering.sh {LC};"
"touch .intron_clustering.chkpnt;"
"cd intronclustering/", LC=config["leafcutter"])
rule prepare_phen_table:
input:
LC,
".intron_clustering.chkpnt"
output:
".prepare_phen_table.chkpnt"
shell:
expand("sbatch --wait src/sqtl_mapping/sh/03_prepare_phen_table.sh {LC};"
"touch .prepare_phen_table.chkpnt",LC=config["leafcutter"])
Please assume the config.yaml
is fine. When I call snakemake -n
, I get the following error:
(venv) [[email protected]@rmccoy22-dev neand_sQTL]$ snakemake -n
Building DAG of jobs...
Job counts:
count jobs
1 all
1 index_vcf
1 intron_clustering
1 junc_cluster
1 prepare_phen_table
5
[Fri Aug 16 12:31:53 2019]
rule index_vcf:
input: /scratch/groups/rmccoy22/Ne_sQTL/files/phg000830.v1.GTEx_WGS.genotype-calls-vcf.c1/GTExWGSGenotypeMatrixBiallelicOnly.HQ.vcf.gz
output: /scratch/groups/rmccoy22/Ne_sQTL/files/phg000830.v1.GTEx_WGS.genotype-calls-vcf.c1/GTExWGSGenotypeMatrixBiallelicOnly.vcf.gz.tbi, /scratch/groups/rmccoy22/Ne_sQTL/files/.index_vcf.chkpnt
jobid: 4
Traceback (most recent call last):
File "/scratch/groups/rmccoy22/aseyedi2/neand_sQTL/venv/lib/python3.7/site-packages/snakemake/__init__.py", line 547, in snakemake
export_cwl=export_cwl)
File "/scratch/groups/rmccoy22/aseyedi2/neand_sQTL/venv/lib/python3.7/site-packages/snakemake/workflow.py", line 674, in execute
success = scheduler.schedule()
File "/scratch/groups/rmccoy22/aseyedi2/neand_sQTL/venv/lib/python3.7/site-packages/snakemake/scheduler.py", line 278, in schedule
self.run(job)
File "/scratch/groups/rmccoy22/aseyedi2/neand_sQTL/venv/lib/python3.7/site-packages/snakemake/scheduler.py", line 294, in run
error_callback=self._error)
File "/scratch/groups/rmccoy22/aseyedi2/neand_sQTL/venv/lib/python3.7/site-packages/snakemake/executors.py", line 75, in run
self._run(job)
File "/scratch/groups/rmccoy22/aseyedi2/neand_sQTL/venv/lib/python3.7/site-packages/snakemake/executors.py", line 86, in _run
self.printjob(job)
File "/scratch/groups/rmccoy22/aseyedi2/neand_sQTL/venv/lib/python3.7/site-packages/snakemake/executors.py", line 92, in printjob
job.log_info(skip_dynamic=True)
File "/scratch/groups/rmccoy22/aseyedi2/neand_sQTL/venv/lib/python3.7/site-packages/snakemake/jobs.py", line 825, in log_info
logger.shellcmd(self.shellcmd, indent=indent)
File "/scratch/groups/rmccoy22/aseyedi2/neand_sQTL/venv/lib/python3.7/site-packages/snakemake/jobs.py", line 323, in shellcmd
self.rule.shellcmd else None)
File "/scratch/groups/rmccoy22/aseyedi2/neand_sQTL/venv/lib/python3.7/site-packages/snakemake/jobs.py", line 732, in format_wildcards
return format(string, **_variables)
File "/scratch/groups/rmccoy22/aseyedi2/neand_sQTL/venv/lib/python3.7/site-packages/snakemake/utils.py", line 378, in format
return fmt.format(_pattern, *args, **variables)
File "/software/apps/python/3.7/lib/python3.7/string.py", line 186, in format
return self.vformat(format_string, args, kwargs)
File "/software/apps/python/3.7/lib/python3.7/string.py", line 190, in vformat
result, _ = self._vformat(format_string, args, kwargs, used_args, 2)
File "/software/apps/python/3.7/lib/python3.7/string.py", line 200, in _vformat
self.parse(format_string):
File "/software/apps/python/3.7/lib/python3.7/string.py", line 284, in parse
return _string.formatter_parser(format_string)
TypeError: expected str, got list
I have no idea what to make of it other than possibly that there is some sort of bug or incompatibility with what I'm trying to do.
Thank you for any help you can offer me.
Upvotes: 2
Views: 167
Reputation: 3701
Okay I decided to take the liberty to also give some style feedback while at it :). This is what I would make it (I ofcourse couldn't test it):
configfile: "config.yaml"
rule all:
input:
".prepare_phen_table.chkpnt"
rule filter_vcf:
input:
expand("{vcf}", vcf=config["vcf"]),
expand("{ncbiFiles}", ncbiFiles=config["ncbiFiles"])
output:
expand("{ncbiFiles}/phg000830.v1.GTEx_WGS.genotype-calls-vcf.c1/GTExWGSGenotypeMatrixBiallelicOnly.HQ.vcf.gz", ncbiFiles=config["ncbiFiles"])
shell:
"src/sqtl_mapping/primary/sh/00a_bcftools_filter.sh"
rule index_vcf:
input:
expand("{ncbiFiles}/phg000830.v1.GTEx_WGS.genotype-calls-vcf.c1/GTExWGSGenotypeMatrixBiallelicOnly.HQ.vcf.gz", ncbiFiles=config["ncbiFiles"])
output:
expand("{ncbiFiles}/phg000830.v1.GTEx_WGS.genotype-calls-vcf.c1/GTExWGSGenotypeMatrixBiallelicOnly.vcf.gz.tbi",ncbiFiles=config["ncbiFiles"]),
touch(expand("{ncbiFiles}/.index_vcf.chkpnt",ncbiFiles=config["ncbiFiles"]))
shell:
"src/sqtl_mapping/primary/sh/00b_index_vcf.sh"
rule junc_cluster:
input:
expand("{ncbiFiles}/.index_vcf.chkpnt", ncbiFiles=config["ncbiFiles"])
output:
touch(".junc_cluster.chkpnt")
shell:
"src/sqtl_mapping/sh/01_junc_cluster.sh"
rule intron_clustering:
input:
".junc_cluster.chkpnt"
output:
touch(".intron_clustering.chkpnt")
params:
LC=config["leafcutter"]
shell:
"src/sqtl_mapping/sh/02_intronclustering.sh {params.LC}"
rule prepare_phen_table:
input:
config["leafcutter"],
".intron_clustering.chkpnt"
output:
touch(".prepare_phen_table.chkpnt")
params:
LC=config["leafcutter"]
shell:
"src/sqtl_mapping/sh/03_prepare_phen_table.sh {params.LC}"
Most importantly this should fix your issue that snakemake complains that it expects a string but gets a list. The expand function returns a list of all combinations, which in your case was a list of one long (but still a list and not a string). I just stored the LC in a param, instead of trying to fill its value by calling expand.
Second I changed your checkpoint touches to a cool in-built function of snakemake.
Last but not least, I deleted all your sbatch calls. The idea of snakemake is that it can be executed on any platform. Local computers, and superclusters alike. If you hardcode the sbatch command it will only work on slurm clusters. If you want to execute your code on your local computer you just type
snakemake ...[params]
and if you want to run it on a slurm cluster all you have to change is
snakemake ...[params] --cluster "sbatch"
And snakemake will prepend the sbatch command to all your function calls. Easy! Take a look again at the cluster docs.
Make sure to take a look again at the snakemake docs and tutorial, so that you understand the idea/paradigm of snakemake. Probably my changes fix 90% of your issues, but need some small tinkering on your side in the end. Good luck!
Upvotes: 2