user10657934
user10657934

Reputation: 157

making snakefile for data analysis

I am making a Snakefile for data analysis. the extension of my raw data is .RCC. for example the first input file I have is: CF30207_01.RCC. and the script I am running on the data is QC.py. Looking at the tutorial, I have made the following snakefile:

SAMPLES = ["CF30207_01",
           "CF30212_06",
           "CF30209_03",
           "CF30213_07",
           "CF30211_05",
           "CF30214_08"]

rule all:
    input:
        expand('{sample}.RCC', sample=SAMPLES)


rule QC:
        input:
            rc = '/home/snakemaker/{sample}.RCC'

        output:
                '{sample}.pdf'
                "quality_control.csv"

        shell:
                "python3 QC.py"

but I got the following errors:

./Snakefile: line 1: SAMPLES: command not found
./Snakefile: line 2: CF30212_06,: command not found
./Snakefile: line 3: CF30209_03,: command not found
./Snakefile: line 4: CF30213_07,: command not found
./Snakefile: line 5: CF30211_05,: command not found
./Snakefile: line 6: CF30214_08]: command not found
./Snakefile: line 8: rule: command not found
./Snakefile: line 9: input:: command not found
./Snakefile: line 10: syntax error near unexpected token `'{sample}.RCC','
./Snakefile: line 10: `        expand('{sample}.RCC', sample=SAMPLES)'

but I followed exactly the same structures. do you guys know how I can fix the problem is with this snakefile?

Upvotes: 0

Views: 426

Answers (2)

Troy Comi
Troy Comi

Reputation: 2079

Welcome to snakemake! You have a good start, but couple of other notes on your snakefile.

rule all:
    input:
        expand('{sample}.RCC', sample=SAMPLES)

The rule all should request the final outputs of your workflow, not the inputs. These are the files you are requesting to be made. Change the input to:

        expand('{sample}.pdf', sample=SAMPLES)

For the QC rule, it doesn't seem like you are passing the input/output files to the QC.py script. If you have command line arguments in that function, you can add them like:

   "python3 QC.py --input {input.rc} --output {output[0]}"

Alternatively you can pass QC.py to the script directive and use snakemake.input[0], etc to access the files in your python code.

Within the output

output:
    '{sample}.pdf'
    "quality_control.csv"

You need to add a comma between the files to make them a list. Also note that every sample will output to the same quality_control.csv. At best this will overwrite and only keep the last sample, if you have multithreading you may have an error in your python code. You may want something like:

output:
    '{sample}.pdf',
    'quality_control_{sample}.csv'

If your QC code actually appends to quality_control, you can instead force a single execution at a time for that rule with custom resources

A good test for new snakefiles is to run snakemake -nq to make sure the file syntax is ok and you have the expected number of rules queued up.

Upvotes: 2

dariober
dariober

Reputation: 9062

I guess you are executing the snakefile script itself as ./Snakefile. Instead, you should do

snakemake -s /path/to/Snakefile

Or just snakemake if the Snakefile is in the current directory.

Upvotes: 2

Related Questions