Snakemake process multiple files in one rule

Question

What is the best way to process a list of files in one rule?

Workflow

The main goal of the workflow is to select the raw data and output the selected data. The directory of the workflow is structured as below.

.
├── data
│   ├── 000_raw
│   │   ├── 15_a.csv
│   │   ├── 15_b.csv
│   │   ├── 15_c.csv
│   │   ├── 16_a.csv
│   │   ├── 16_b.csv
│   │   └── 16_c.csv
│   └── 010_sel
│       ├── 15_a.csv
│       ├── 15_b.csv
│       ├── 15_c.csv
│       ├── 16_a.csv
│       ├── 16_b.csv
│       └── 16_c.csv
├── scripts
│   └── 010_sel.py
└── Snakefile

The selection script 010_sel.py read and produce one file at each time, i.e. the common way to run it is

python scripts/010_sel.py data/000_raw/15_a.csv data/010_sel/15_a.csv

Snakefile

I use expand and run method in the snakemake file.

ls_year_type = [15_a,15_b,15_c,16_a,16_b,16_c]

rule sel_010:
    input:
        expand("data/000_raw/{year_type}.csv",year_mag=ls_year_type)
    output:
        expand("data/010_sel/{year_type}.csv",year_mag=ls_year_type)
    run: 
        for ifile in range(len(output)):
            os.system("python scripts/010_sel.py {} {}".format(input[ifile],output[ifile]))

Problems

There are two problems with this method.

The expand command generate a list. If only one of the files in the list is modified, for example,rm data/010_sel/15_b.csv, snakemake will rerun the scripts on every file in the list. It is time consuming.
If the script 010_sel.py is modified, snakemake will not know it. Need to rerun the snakefile manually.

Optional method

One optional way is to rewrite the 010_sel.py to include snakemake commands rather than using sys.argv

for i in range(len(snakemake.input)):
    input_file = snakemake.input[i]
    output_file = snakemake.output[i]

In snakemake file change run to script

script:
    "scripts/010_sel.py"

This will solve the second problem but the first one remains.

Thanks in advance for any help.

Snakemake process multiple files in one rule

Workflow

Snakefile

Problems

Optional method

Answers (1)

Related Questions