Reputation: 27
I want to run a pipeline for different files, but some of them don't need all of the defined nodes. How can I pass them?
Upvotes: 0
Views: 1957
Reputation: 1578
You can also use --to-nodes
CLI option: kedro run --to-nodes node1,node2
. Internally this will call pipeline.to_nodes("node1", "node2")
- method docs. Please note that you would still need to identify the "final" list of nodes that have to be run.
Upvotes: 1
Reputation: 563
To filter out a few lines of a pipeline you can simply filter the pipeline list from inside of python, my favorite way is to use a list comprehension.
by name
nodes_to_run = [node for node in pipeline.nodes if 'dont_run_me' not in node.name]
run(nodes_to_run, io)
by tag
nodes_to_run = [node for node in pipeline.nodes if 'dont_run_tag' not in node.tags]
run(nodes_to_run, io)
It's possible to filter by any attribute tied to the pipeline node, (name, inputs, outputs, short_name, tags)
If you need to run your pipeline this way in production or from the command line, you can either tag your pipeline to run with tags, or add a custom click.option
to your run
function inside of kedro_cli.py
then run this filter when the flag is True
.
Note
This assumes that you have your pipeline loaded into memory as pipeline
and catalog loaded in as io
Upvotes: 1
Reputation: 1033
Would modular pipelines help here? You could build two pipelines, one consisting of just the two "optional" nodes and the other without, then you can return the default pipeline being the sum of the two. Somethign like this:
def create_pipelines(**kwargs):
two_node_pipeline = Pipeline(node(), node())
rest_of_pipeline = Pipeline(node(), node(), node(), node())
return {
"rest_of_pipeline": rest_of_pipeline,
"__default__": two_node_pipeline + rest_of_pipeline,
}
Then you can do kedro run --pipeline rest_of_pipeline
to run the pipeline without those two nodes or kedro run
to run the pipeline with the extra two nodes.
Otherwise, I think if you modify your kedro_cli
or ProjectContext
or run.py
, whatever it is, it should be fairly easy to add in the --except
functionality yourself. I might look into doing this...
Kedro will do the sorting of the nodes automatically, according to toposort, see this previous answer: How to run the nodes in sequence as declared in kedro pipeline?
Upvotes: 1