Airflow - Using an upstream task for multiple downstream tasks

Question

Okay I apologize if this is a dumb question, because it seems so obvious that it should work. But I can't find it documented, and as are examining our options as we look to build a new data pipeline, I really want this to be a feature...

Can multiple downstream processes be dependent on a single upstream process, where the upstream process only runs once. In other words, can I extract a table one time, and then load it to my data warehouse, and have multiple aggregations that are dependent on that load being complete?

For a bit more information, we are attempting to go to an asynchronous extract-load-transform where the extract is started, and then the loads and transforms finish as soon as they have the subset of tables they need from the extract.

tobi6 · Accepted Answer

This seems to me like a usual DAG with unusual wording. I understand the required structure like this:

extract_table_task \
                   |- task1_do_stuff
                   |- task2_do_some_other_stuff
                   |- task3_...

Or in Airflow code:

extract_table_task.set_downstream(task1_do_stuff)
extract_table_task.set_downstream(task2_do_some_other_stuff)
extract_table_task.set_downstream(task3_...)

Then make sure to select the correct trigger rules for your workflow, e.g. if some tasks should run even if something went wrong: https://airflow.apache.org/concepts.html#trigger-rules

Airflow - Using an upstream task for multiple downstream tasks

Answers (2)

Related Questions