Reputation: 11
I am sorry for low level question, I am junior. I try to learn snakemake along with click. Please, help me to understand, for this example, how can I put a list of pathes to input in rule? And get this list in python script.
Snakemake:
path_1 = 'data/raw/data2process/'
path_2 = 'data/raw/table.xlsx'
rule:
input:
list_of_pathes = "list of all pathes to .xlsx/.csv/.xls files from path_1"
other_table = path_2
output:
{some .xlsx file}
shell:
"script_1.py {input.list_of_pathes} {output}"
"script_2.py {input.other_table} {output}"
script_1.py:
@click.command()
@click.argument(input_list_of_pathes, type=*??*)
@click.argument("out_path", type=click.Path())
def foo(input_list_of_pathes: list, out_path: str):
df = pd.DataFrame()
for path in input_list_of_pathes:
table = pd.read_excel(path)
**do smthng**
df = pd.concat([df, table])
df.to_excel(out_path)
script_2.py:
@click.command()
@click.argument("input_path", type=type=click.Path(exist=True))
@click.argument("output_path", type=click.Path())
def foo_1(input_path: str, output_path: str):
table = pd.read_excel(input_path)
**do smthng**
table.to_excel(output_path)
Upvotes: 1
Views: 110
Reputation: 8194
Using pathlib, and the glob
method of a Path
object, you could proceed as follows:
from itertools import chain
from pathlib import Path
path_1 = Path('data/raw/data2process/')
exts = ["xlsx", "csv", "xls"]
path_1_path_lists = [
list(path_1.glob(f"*.{ext}"))
for ext in exts]
path_1_all_paths = list(chain.from_iterable(path_1_dict.values()))
The chain.from_iterables
allows to "flatten" the list of lists, but I'm not sure Snakemake even needs this for the input of its rules.
Then, in your rule:
input:
list_of_paths = path_1_all_paths,
other_table = path_2
I think that Path
objects can be used directly. Otherwise, you need to turn them into strings with str
:
input:
list_of_paths = [str(p) for p in path_1_all_paths],
other_table = path_2
Upvotes: 1