How to make rule with directory as input and multiple directories/files as output?

Question

I want to make a workflow to convert BCL files from sequencer to expression matrix using cellranger software. I am new to snakemake.

I copy files from storage to local machine, launch in shell mkfastq to generate FASTQ files and store in FASTQ/.

In order to generate expression matrix from FASTQ files I should pass the whole FASTQ directory to cellranger. After that, cellranger creates sample directories where it stores expression matrices, reports. logs and other files.

My pipeline:

samples = ['201', '202']
fc_name = '230119_FOO'
run = 'storage/vud/230119_FOO'

rule all:
        input:
                expand("RESULT/{fc_name}/{sample}", sample=samples, fc_name = fc_name)
                
#Copy from storage to local machine
rule copy:
        input:
                expand({run}, run=run)
        output:
                expand("BCL/{fc_name}", fc_name = fc_name)
        shell:
                "rsync -ah {run} BCL/"

#Make FASTQ files
rule mkfastq:
        input:
                fastq_run=expand("BCL/{fc_name}", fc_name = fc_name)
        output:
                expand("FASTQ/{fc_name}", fc_name = fc_name),
                expand("FASTQ/{fc_name}/outs/input_samplesheet.csv", fc_name = fc_name)
        shell:
                "cellranger mkfastq --run={input.fastq_run} --id={fc_name} --output-dir=FASTQ/"

# Make matrices
rule mkmat:
        input:
                expand("FASTQ/{fc_name}", fc_name = fc_name)
        output:
                expand("RESULT/{fc_name}/{sample}", sample=samples, fc_name=fc_name)
        shell:
                expand("cellranger count -id=RESULT/{{fc_name}}/{{samples}} --transcriptome=refdata-gex-mm10-2020-A/ --fastqs=FASTQ/230119_FOO --sample={{samples}}", samples = samples, fc_name=fc_name)

I perform dry-run of pipeline and snakemake throws an error:

 File "/miniconda3/envs/snakemake/lib/python3.11/site-packages/snakemake/jobs.py", line 521, in shellcmd
    self.format_wildcards(self.rule.shellcmd)
  File "/miniconda3/envs/snakemake/lib/python3.11/site-packages/snakemake/jobs.py", line 986, in format_wildcards
    f"{ex.__class__.__name__}: {ex}, when formatting the following:
"
TypeError: can only concatenate str (not "list") to str

How to pass a directory "FASTQ/230119_FOO" to mkmat rule and get this output:

├── RESULT
│   ├── 230119_FOO
│   │   ├── 201
│   │   │   ├── ...
│   │   ├── 202
│   │   │   ├── ...

How to make rule with directory as input and multiple directories/files as output?

Answers (1)

Related Questions