lampShadesDrifter
lampShadesDrifter

Reputation: 4159

Way to designate certain set of airflow tasks to run before others (order invariant)?

Have an airflow (v1.10.5) dag that looks like...

enter image description here

Is there a way to specify that all of the blue tasks should complete before scheduler moves on to any downstream tasks (as currently scheduler sometimes goes down an entire branch of tasks before doing the next blue task)?

Want to avoid just putting them in sequence (and using with trigger rule TriggerRule.ALL_DONE) because they do not actually have any logical order in which they need to be done (other than that they all need to be done before any other downstream tasks in any branch).

Anyone know of any way to do this (like some kind of "priority" pool for tasks)? Other workaround suggestions?

Upvotes: 1

Views: 635

Answers (1)

lampShadesDrifter
lampShadesDrifter

Reputation: 4159

Asked this question on the airflow mailing list and this is the results...

white
blue = [blue_a, blue_b, blue_c]
green = [green_a, green_b, green_c]
yellow = [yellow_a, yellow_b]

cross_downstream(from_tasks=[white], to_tasks=[blue])
cross_downstream(from_tasks=blue, to_tasks=green)
cross_downstream(from_tasks=green to_tasks=yellow)

This should create the required network of dependencies between tasks.

Here is visualization available:
https://imgur.com/a/2jqyqQO

This is the easiest solution and in my opinion the correct one.
However, if you don't want a dependencies then you can create a new
schedule rule by editing the BaseOperator.deps property.

The docs for this helper dag building function can be found here: https://airflow.apache.org/docs/stable/concepts.html#relationship-helper

Which was a useful solution, but...

One thing about my case is that the next tasks (greens) in each branch should only run if the blue task in that same branch completes successfully (should not care about the success/failure status of the other blue tasks, only that they have been run). Thus I don't think the ALL_DONE trigger rule will help the greens and ALL_SUCCESS would be too strict.

Any ideas for such a thing?

After some more thought, here is my workaround...

enter image description here

Upvotes: 1

Related Questions