Reputation: 2101
Currently i have a DAG consisting of 4 operators as shown below:
with DAG('dag', default_args=args, schedule_interval=schedule_interval, catchup=True) as dag:
main_dag = PythonOperator(
task_id='1',
python_callable=func,
provide_context=True,
dag=dag)
run_after_main_dag_1 = PythonOperator(
task_id='1',
python_callable=foo,
provide_context=True,
dag=dag)
run_after_main_dag_2 = BranchPythonOperator(
task_id='2',
python_callable=foo,
provide_context=True)
run_after_main_dag_2_2 = PythonOperator(
task_id='3',
python_callable=foo,
provide_context=False,
dag=dag)
#this runs sequential, but shouldn't.
main_dag >> run_after_main_dag_1 >> run_after_main_dag_2 >> run_after_main_dag_2_2
Here's what i'd like to achieve:
Run main_dag
operator
Once main_dag
is finished, start run_after_main_dag_1
and run_after_main_dag_2
in parallel, as they are not independent of each other.
I simply can't find how to achieve this in the docs anywhere. There must be a simple syntax i have completely overlooked.
Anyone who knows how to make it happen?
Upvotes: 2
Views: 1850
Reputation: 8273
In Airflow >>
and <<
are used to set up the downstream and upstream dependency.
You code
main_dag >> run_after_main_dag_1 >> run_after_main_dag_2 >> run_after_main_dag_2_2 #sequentially
It is actually defining the relationship that runs sequentially as run_after_main_dag_1
's upstream is set to main_dag
and so on.
In order to separate run_after_main_dag_1
and run_after_main_dag_2
you can define relationship such that both have upstream task as main_dag
main_dag >> run_after_main_dag_1 # It is just dependent on main_dag
main_dag >> run_after_main_dag_2 # It is just dependent on main_dag
It will then kick off the two tasks in parallel once the main_dag task finish its execution
Upvotes: 0
Reputation: 2101
So there was a simple answer:
main_dag >> run_after_main_dag_1
main_dag >> run_after_main_dag_2 >> run_after_main_dag_2_2
Upvotes: 1