idanov
idanov

Reputation: 156

How to run parts of your Kedro pipeline conditionally?

I have a big pipeline, taking a few hours to run. A small part of it needs to run quite often, how do I run it without triggering the entire pipeline?

Upvotes: 7

Views: 11107

Answers (2)

Waylon Walker
Waylon Walker

Reputation: 563

I would reccomend getting your tags or piplines setup to run correctly from the cli as @idanov suggested. It will be much easier for you in the long run moving to production. I would also add that you can do quite a bit of ad hoc pipeline trimming and running inside of python, here are some examples.

🔖 filter by tags

nodes = pipeline.only_nodes_with_tags('cars')

filter by node

nodes = pipeline.only_nodes('b_int_cars')

filter nodes like

query_string = 'cars'
nodes = [
   node.name 
   for node in pipeline.nodes 
   if query_string in node.name
   ]
pipeline.only_nodes(*nodes)

only nodes with tags or

nodes = pipeline.only_nodes_with_tags('cars', 'trains')

only nodes with tags and

raw_nodes = pipeline.only_nodes_with_tags('raw')
car_nodes = pipeline.only_nodes_with_tags('cars')
raw_car_nodes = raw_nodes & car_nodes
raw_nodes = (
   pipeline
   .only_nodes_with_tags('raw')
   .only_nodes_with_tags('cars')
   )

add pipelines

car_nodes = pipeline.only_nodes_with_tags('cars')
train_nodes = pipeline.only_nodes_with_tags('trains')
transportation_nodes = car_nodes + train_nodes

The above was a snippet from my personal kedro notes.

Upvotes: 1

idanov
idanov

Reputation: 156

There are multiple ways to specify which nodes or parts of your pipeline to run.

  1. Use kedro run parameters like --to-nodes/--from-nodes/--node to explicitly define what needs to be run.

  2. In kedro>=0.15.2 you can define multiple pipelines, and then run only one of them with kedro run --pipeline <name>. If no --pipeline parameter is specified, the default pipeline is run. The default pipeline might combine several other pipelines. More information about using modular pipelines: https://kedro.readthedocs.io/en/latest/04_user_guide/06_pipelines.html#modular-pipelines

  3. Use tags. Tag a small portion of your pipeline with something like "small", and then do kedro run --tag small. Read more here: https://kedro.readthedocs.io/en/latest/04_user_guide/05_nodes.html#tagging-nodes

Upvotes: 6

Related Questions