sofiacosta29
sofiacosta29

Reputation: 27

How to run a pipeline except for a few nodes?

I want to run a pipeline for different files, but some of them don't need all of the defined nodes. How can I pass them?

Upvotes: 0

Views: 1957

Answers (3)

Dmitry Deryabin
Dmitry Deryabin

Reputation: 1578

You can also use --to-nodes CLI option: kedro run --to-nodes node1,node2. Internally this will call pipeline.to_nodes("node1", "node2") - method docs. Please note that you would still need to identify the "final" list of nodes that have to be run.

Upvotes: 1

Waylon Walker
Waylon Walker

Reputation: 563

To filter out a few lines of a pipeline you can simply filter the pipeline list from inside of python, my favorite way is to use a list comprehension.

by name

nodes_to_run = [node for node in pipeline.nodes if 'dont_run_me' not in node.name]
run(nodes_to_run, io)

by tag

nodes_to_run = [node for node in pipeline.nodes if 'dont_run_tag' not in node.tags]
run(nodes_to_run, io)

It's possible to filter by any attribute tied to the pipeline node, (name, inputs, outputs, short_name, tags)

If you need to run your pipeline this way in production or from the command line, you can either tag your pipeline to run with tags, or add a custom click.option to your run function inside of kedro_cli.py then run this filter when the flag is True.

Note This assumes that you have your pipeline loaded into memory as pipeline and catalog loaded in as io

Upvotes: 1

Zain Patel
Zain Patel

Reputation: 1033

Would modular pipelines help here? You could build two pipelines, one consisting of just the two "optional" nodes and the other without, then you can return the default pipeline being the sum of the two. Somethign like this:

def create_pipelines(**kwargs):
    two_node_pipeline = Pipeline(node(), node())
    rest_of_pipeline = Pipeline(node(), node(), node(), node())

    return {
        "rest_of_pipeline": rest_of_pipeline,
        "__default__": two_node_pipeline + rest_of_pipeline,
    }

Then you can do kedro run --pipeline rest_of_pipeline to run the pipeline without those two nodes or kedro run to run the pipeline with the extra two nodes.

Otherwise, I think if you modify your kedro_cli or ProjectContext or run.py, whatever it is, it should be fairly easy to add in the --except functionality yourself. I might look into doing this...

Kedro will do the sorting of the nodes automatically, according to toposort, see this previous answer: How to run the nodes in sequence as declared in kedro pipeline?

Upvotes: 1

Related Questions