Snakemake special symbol in wildcard

Question

I have Snakamake rule as follows, where my wildcard contains special characters, so I escape them using sub, see answer here. Output file: data/extract_AAV(1).csv.

import re
rule get_data:
    input:
    output: "data/extract_{re.sub(r'([()])', r'\\1', filename)}.csv"
    shell: "python get_data.py --filename {re.sub(r'([()])', r'\\1', wildcards.filename)}"

However, I get an error as follows:

module 're' has no attribute 'sub(r'('

Running the re module works fine in Python:

filename = 'extract_AAV(1).csv'
print(re.sub(r'([()])', r'\\1', filename)
# returns: extract_AAV$1$.csv

A reproducible example of the error when passing wildcards with special characters from Snakemake to a python script is as follows:

Snakemake file:

rule get_data:
     output: "extract_{sample}.csv"
     shell: "python run.py --fn {wildcards.sample}"

run.py

import argparse
parser = argparse.ArgumentParser()
parser.add_argument('--fn', type=str)
args = parser.parse_args()
import pandas as pd

df = pd.DataFrame({'a': [1,2,3]})
df.to_csv("extract_"+args.fn+'.csv')

command to execute attempt 1:

$ snakemake extract_AAV(1).csv --cores 1
bash: syntax error near unexpected token `('

command to execute attempt 2:

$ snakemake extract_AAV$1$.csv --cores 1
Building DAG of jobs...
Using shell: /bin/bash
Provided cores: 1 (use --cores to define parallelism)
Rules claiming more threads will be scaled down.
Job counts:
        count   jobs
        1       get_data
        1

[Wed Apr 29 11:31:34 2020]
rule get_data:
    output: extract_AAV(1).csv
    jobid: 0
    wildcards: sample=AAV(1)
/bin/bash: -c: line 0: syntax error near unexpected token `('
/bin/bash: -c: line 0: `set -euo pipefail;  python run.py --fn AAV(1)'
[Wed Apr 29 11:31:34 2020]
Error in rule get_data:
    jobid: 0
    output: extract_AAV(1).csv
    shell:
        python run.py --fn AAV(1)
        (one of the commands exited with non-zero exit code; note that snakemake uses bash strict mode!)

Shutting down, this might take some time.
Exiting because a job execution failed. Look above for error message
Complete log: /home/XXXXX/.snakemake/log/2020-04-29T113134.773987.snakemake.log

Dmitry Kuzminov · Accepted Answer

There is no need to escape symbols in wildcards in your case. Moreover, the wildcard is just an identifier, no expressions are allowed. The script below illustrates how to produce the file data/extract_AAV(1).csv:

rule all:
    input: "data/extract_AAV(1).csv"

rule get_data:
    output: "data/extract_AAV({index}).csv"
    shell: "touch {output}"

From my experience there may be a problem with spaces or quotes if they are used in filenames. For example, if the filename contains a whitespace, you need to put it in quotes in the CLI command:

rule all:
    input: "data/extract_AAV (1).csv"

rule get_data:
    output: "data/extract_AAV ({index}).csv"
    shell: "touch \"{output}\""

Snakemake special symbol in wildcard

Answers (2)

Related Questions