CelineDion
CelineDion

Reputation: 1088

Snakemake producing wildly incoherent error when dryrunning

I've spent the past year working on a project executed on a SLURM-managed cluster. Now we want to make our results reproducible and to do so, we're porting it to snakemake. However, I am learning it from scratch and it's giving me a headache.

Below is my code:

# module load python/3.7
# python -m venv ./venv
# source ./venv/bin/activate
# pip install snakemake
# snakemake --version

configfile: "config.yaml"
#localrules:
vcf=config["vcf"],
ncbiFiles=config["ncbiFiles"]
LC=config["leafcutter"]


rule all:
    input: ".prepare_phen_table.chkpnt"

rule filter_vcf:
    input:
        expand("{vcf}", vcf=config["vcf"]),
        expand("{ncbiFiles}", ncbiFiles=config["ncbiFiles"])
    output:
        expand("{ncbiFiles}/phg000830.v1.GTEx_WGS.genotype-calls-vcf.c1/GTExWGSGenotypeMatrixBiallelicOnly.HQ.vcf.gz", ncbiFiles=config["ncbiFiles"])
    shell:
        expand("sbatch --wait --export=vcf={vcf},outdir=$PWD src/sqtl_mapping/primary/sh/00a_bcftools_filter.sh", vcf=config["vcf"])

rule index_vcf:
    input:
        expand("{ncbiFiles}/phg000830.v1.GTEx_WGS.genotype-calls-vcf.c1/GTExWGSGenotypeMatrixBiallelicOnly.HQ.vcf.gz", ncbiFiles=config["ncbiFiles"])
    output:
        expand("{ncbiFiles}/phg000830.v1.GTEx_WGS.genotype-calls-vcf.c1/GTExWGSGenotypeMatrixBiallelicOnly.vcf.gz.tbi",ncbiFiles=config["ncbiFiles"]),
        expand("{ncbiFiles}/.index_vcf.chkpnt",ncbiFiles=config["ncbiFiles"])
    shell:
        expand("sbatch --export=outdir=$PWD src/sqtl_mapping/primary/sh/00b_index_vcf.sh;"
        "touch {ncbiFiles}/.index_vcf.chkpnt",ncbiFiles=config["ncbiFiles"])

rule junc_cluster:
    input:
        expand("{ncbiFiles}/.index_vcf.chkpnt", ncbiFiles=config["ncbiFiles"])
    output:
        ".junc_cluster.chkpnt"
    shell:
        "sbatch --wait src/sqtl_mapping/sh/01_junc_cluster.sh;"
        "touch .junc_cluster.chkpnt"

rule intron_clustering:
    input:
        ".junc_cluster.chkpnt"
    output:
        ".intron_clustering.chkpnt"
    shell:
        expand("sbatch --wait src/sqtl_mapping/sh/02_intronclustering.sh {LC};"
        "touch .intron_clustering.chkpnt;"
        "cd intronclustering/", LC=config["leafcutter"])

rule prepare_phen_table:
    input:
        LC,
        ".intron_clustering.chkpnt"
    output:
        ".prepare_phen_table.chkpnt"
    shell:
        expand("sbatch --wait src/sqtl_mapping/sh/03_prepare_phen_table.sh {LC};"
        "touch .prepare_phen_table.chkpnt",LC=config["leafcutter"])

Please assume the config.yaml is fine. When I call snakemake -n, I get the following error:

(venv) [[email protected]@rmccoy22-dev neand_sQTL]$ snakemake -n
Building DAG of jobs...
Job counts:
    count   jobs
    1   all
    1   index_vcf
    1   intron_clustering
    1   junc_cluster
    1   prepare_phen_table
    5

[Fri Aug 16 12:31:53 2019]
rule index_vcf:
    input: /scratch/groups/rmccoy22/Ne_sQTL/files/phg000830.v1.GTEx_WGS.genotype-calls-vcf.c1/GTExWGSGenotypeMatrixBiallelicOnly.HQ.vcf.gz
    output: /scratch/groups/rmccoy22/Ne_sQTL/files/phg000830.v1.GTEx_WGS.genotype-calls-vcf.c1/GTExWGSGenotypeMatrixBiallelicOnly.vcf.gz.tbi, /scratch/groups/rmccoy22/Ne_sQTL/files/.index_vcf.chkpnt
    jobid: 4

Traceback (most recent call last):
  File "/scratch/groups/rmccoy22/aseyedi2/neand_sQTL/venv/lib/python3.7/site-packages/snakemake/__init__.py", line 547, in snakemake
    export_cwl=export_cwl)
  File "/scratch/groups/rmccoy22/aseyedi2/neand_sQTL/venv/lib/python3.7/site-packages/snakemake/workflow.py", line 674, in execute
    success = scheduler.schedule()
  File "/scratch/groups/rmccoy22/aseyedi2/neand_sQTL/venv/lib/python3.7/site-packages/snakemake/scheduler.py", line 278, in schedule
    self.run(job)
  File "/scratch/groups/rmccoy22/aseyedi2/neand_sQTL/venv/lib/python3.7/site-packages/snakemake/scheduler.py", line 294, in run
    error_callback=self._error)
  File "/scratch/groups/rmccoy22/aseyedi2/neand_sQTL/venv/lib/python3.7/site-packages/snakemake/executors.py", line 75, in run
    self._run(job)
  File "/scratch/groups/rmccoy22/aseyedi2/neand_sQTL/venv/lib/python3.7/site-packages/snakemake/executors.py", line 86, in _run
    self.printjob(job)
  File "/scratch/groups/rmccoy22/aseyedi2/neand_sQTL/venv/lib/python3.7/site-packages/snakemake/executors.py", line 92, in printjob
    job.log_info(skip_dynamic=True)
  File "/scratch/groups/rmccoy22/aseyedi2/neand_sQTL/venv/lib/python3.7/site-packages/snakemake/jobs.py", line 825, in log_info
    logger.shellcmd(self.shellcmd, indent=indent)
  File "/scratch/groups/rmccoy22/aseyedi2/neand_sQTL/venv/lib/python3.7/site-packages/snakemake/jobs.py", line 323, in shellcmd
    self.rule.shellcmd else None)
  File "/scratch/groups/rmccoy22/aseyedi2/neand_sQTL/venv/lib/python3.7/site-packages/snakemake/jobs.py", line 732, in format_wildcards
    return format(string, **_variables)
  File "/scratch/groups/rmccoy22/aseyedi2/neand_sQTL/venv/lib/python3.7/site-packages/snakemake/utils.py", line 378, in format
    return fmt.format(_pattern, *args, **variables)
  File "/software/apps/python/3.7/lib/python3.7/string.py", line 186, in format
    return self.vformat(format_string, args, kwargs)
  File "/software/apps/python/3.7/lib/python3.7/string.py", line 190, in vformat
    result, _ = self._vformat(format_string, args, kwargs, used_args, 2)
  File "/software/apps/python/3.7/lib/python3.7/string.py", line 200, in _vformat
    self.parse(format_string):
  File "/software/apps/python/3.7/lib/python3.7/string.py", line 284, in parse
    return _string.formatter_parser(format_string)
TypeError: expected str, got list

I have no idea what to make of it other than possibly that there is some sort of bug or incompatibility with what I'm trying to do.

Thank you for any help you can offer me.

Upvotes: 2

Views: 167

Answers (1)

Maarten-vd-Sande
Maarten-vd-Sande

Reputation: 3701

Okay I decided to take the liberty to also give some style feedback while at it :). This is what I would make it (I ofcourse couldn't test it):

configfile: "config.yaml"


rule all:
    input: 
        ".prepare_phen_table.chkpnt"


rule filter_vcf:
    input:
        expand("{vcf}", vcf=config["vcf"]),
        expand("{ncbiFiles}", ncbiFiles=config["ncbiFiles"])
    output:
        expand("{ncbiFiles}/phg000830.v1.GTEx_WGS.genotype-calls-vcf.c1/GTExWGSGenotypeMatrixBiallelicOnly.HQ.vcf.gz", ncbiFiles=config["ncbiFiles"])
    shell:
        "src/sqtl_mapping/primary/sh/00a_bcftools_filter.sh"


rule index_vcf:
    input:
        expand("{ncbiFiles}/phg000830.v1.GTEx_WGS.genotype-calls-vcf.c1/GTExWGSGenotypeMatrixBiallelicOnly.HQ.vcf.gz", ncbiFiles=config["ncbiFiles"])
    output:
        expand("{ncbiFiles}/phg000830.v1.GTEx_WGS.genotype-calls-vcf.c1/GTExWGSGenotypeMatrixBiallelicOnly.vcf.gz.tbi",ncbiFiles=config["ncbiFiles"]),
        touch(expand("{ncbiFiles}/.index_vcf.chkpnt",ncbiFiles=config["ncbiFiles"]))
    shell:
        "src/sqtl_mapping/primary/sh/00b_index_vcf.sh"


rule junc_cluster:
    input:
        expand("{ncbiFiles}/.index_vcf.chkpnt", ncbiFiles=config["ncbiFiles"])
    output:
        touch(".junc_cluster.chkpnt")
    shell:
        "src/sqtl_mapping/sh/01_junc_cluster.sh"


rule intron_clustering:
    input:
        ".junc_cluster.chkpnt"
    output:
        touch(".intron_clustering.chkpnt")
    params:
        LC=config["leafcutter"]
    shell:
        "src/sqtl_mapping/sh/02_intronclustering.sh {params.LC}"


rule prepare_phen_table:
    input:
        config["leafcutter"],
        ".intron_clustering.chkpnt"
    output:
        touch(".prepare_phen_table.chkpnt")
    params:
        LC=config["leafcutter"]
    shell:
        "src/sqtl_mapping/sh/03_prepare_phen_table.sh {params.LC}"

Most importantly this should fix your issue that snakemake complains that it expects a string but gets a list. The expand function returns a list of all combinations, which in your case was a list of one long (but still a list and not a string). I just stored the LC in a param, instead of trying to fill its value by calling expand.

Second I changed your checkpoint touches to a cool in-built function of snakemake.

Last but not least, I deleted all your sbatch calls. The idea of snakemake is that it can be executed on any platform. Local computers, and superclusters alike. If you hardcode the sbatch command it will only work on slurm clusters. If you want to execute your code on your local computer you just type

snakemake ...[params]

and if you want to run it on a slurm cluster all you have to change is

snakemake ...[params] --cluster "sbatch"

And snakemake will prepend the sbatch command to all your function calls. Easy! Take a look again at the cluster docs.

Make sure to take a look again at the snakemake docs and tutorial, so that you understand the idea/paradigm of snakemake. Probably my changes fix 90% of your issues, but need some small tinkering on your side in the end. Good luck!

Upvotes: 2

Related Questions