Parsa
Parsa

Reputation: 3236

Snakemake special symbol in wildcard

I have Snakamake rule as follows, where my wildcard contains special characters, so I escape them using sub, see answer here. Output file: data/extract_AAV(1).csv.

import re
rule get_data:
    input:
    output: "data/extract_{re.sub(r'([()])', r'\\\1', filename)}.csv"
    shell: "python get_data.py --filename {re.sub(r'([()])', r'\\\1', wildcards.filename)}"

However, I get an error as follows:

module 're' has no attribute 'sub(r'('

Running the re module works fine in Python:

filename = 'extract_AAV(1).csv'
print(re.sub(r'([()])', r'\\\1', filename)
# returns: extract_AAV\\(1\\).csv

A reproducible example of the error when passing wildcards with special characters from Snakemake to a python script is as follows:

Snakemake file:

rule get_data:
     output: "extract_{sample}.csv"
     shell: "python run.py --fn {wildcards.sample}"

run.py

import argparse
parser = argparse.ArgumentParser()
parser.add_argument('--fn', type=str)
args = parser.parse_args()
import pandas as pd

df = pd.DataFrame({'a': [1,2,3]})
df.to_csv("extract_"+args.fn+'.csv')

command to execute attempt 1:

$ snakemake extract_AAV(1).csv --cores 1
bash: syntax error near unexpected token `('

command to execute attempt 2:

$ snakemake extract_AAV\(1\).csv --cores 1
Building DAG of jobs...
Using shell: /bin/bash
Provided cores: 1 (use --cores to define parallelism)
Rules claiming more threads will be scaled down.
Job counts:
        count   jobs
        1       get_data
        1

[Wed Apr 29 11:31:34 2020]
rule get_data:
    output: extract_AAV(1).csv
    jobid: 0
    wildcards: sample=AAV(1)
/bin/bash: -c: line 0: syntax error near unexpected token `('
/bin/bash: -c: line 0: `set -euo pipefail;  python run.py --fn AAV(1)'
[Wed Apr 29 11:31:34 2020]
Error in rule get_data:
    jobid: 0
    output: extract_AAV(1).csv
    shell:
        python run.py --fn AAV(1)
        (one of the commands exited with non-zero exit code; note that snakemake uses bash strict mode!)

Shutting down, this might take some time.
Exiting because a job execution failed. Look above for error message
Complete log: /home/XXXXX/.snakemake/log/2020-04-29T113134.773987.snakemake.log

Upvotes: 1

Views: 682

Answers (2)

Dmitry Kuzminov
Dmitry Kuzminov

Reputation: 6600

There is no need to escape symbols in wildcards in your case. Moreover, the wildcard is just an identifier, no expressions are allowed. The script below illustrates how to produce the file data/extract_AAV(1).csv:

rule all:
    input: "data/extract_AAV(1).csv"

rule get_data:
    output: "data/extract_AAV({index}).csv"
    shell: "touch {output}"

From my experience there may be a problem with spaces or quotes if they are used in filenames. For example, if the filename contains a whitespace, you need to put it in quotes in the CLI command:

rule all:
    input: "data/extract_AAV (1).csv"

rule get_data:
    output: "data/extract_AAV ({index}).csv"
    shell: "touch \"{output}\""

Upvotes: 1

dariober
dariober

Reputation: 9062

Either I cannot reproduce the problem or you are making things more complicated than necessary. This works for me:

samples = ['AAV(1)', 'AAV(2)']

rule all:
    input:
        expand('data/extract_{sample}.csv', sample= samples),

wildcard_constraints:
    sample= '|'.join([re.escape(x) for x in samples]),

rule one:
    output:
        'data/extract_{sample}.csv'
    shell:
        r"""
        touch '{output}'
        """

(The wildcard_constraints bit is not necessary here but I tend to use it quite liberally).

If this doesn't help, can you post a reproducible example?

Upvotes: 0

Related Questions