Reputation: 582
I have dagA (cron 5am) and dagB (cron 6am). Both of these ingest the data from somewhere and dump into the datalake. Now I want dagC (an ETL job) to wait for both dagA and dagB to complete.
I am using an ExternalTaskSensor
instead of a TriggerDagRunOperator
since I don't believe the ingestion layer should trigger anything downstream. I've read similar questions stating I should run the dags at the same time.
Now, this part confuses me because if I am to follow this, does this mean all my airflow jobs will start at the same time and the downstream jobs keep poking until the upstream is ready? Does that also mean dagA and dagB have to start at the same time even though they have no dependency between each other?
dagA = DAG('dagA', description='dagA',
schedule_interval='0 5 * * *',
start_date=datetime(2017, 3, 20), catchup=False)
dagB = DAG('dagB', description='dagB',
schedule_interval='0 6 * * *',
start_date=datetime(2017, 3, 20), catchup=False)
dagC = DAG('dagC', description='dagC',
schedule_interval=None,
start_date=datetime(2017, 3, 20), catchup=False)
wait_for_dagA = ExternalTaskSensor(
task_id='wait_for_dagA',
external_dag_id='dagA',
external_task_id=None,
execution_delta=None,
dag=dag)
wait_for_dagB = ExternalTaskSensor(
task_id='wait_for_dagB',
external_dag_id='dagB',
external_task_id=None,
execution_delta=None,
dag=dag)
[wait_for_dagA, wait_for_dagB] >> etl_task
I am on airflow 1.10.3.
Upvotes: 0
Views: 6320
Reputation: 11607
..does this mean all my airflow jobs will start at the same time and the downstream jobs keep poking until the upstream is ready?
etl_task
and it's downstream dependencies) will start only post success of both wait_for_dagA
and wait_for_dagB
. These waiting tasks will keep poking (that's what sensors do) until the respective DAGs succeed.Does that also mean dagA and dagB have to start at the same time even though they have no dependency between each other?
As already told above, this is not a requirement. The entire idea of replacing cron
s with DAG
s is that you don't need to time your tasks accurately; rather you can have the flexibility of forcing them to run one-after-another irrespective of different start-times, execution times and unexpected delays.
Tips
mode
paramExternalTaskSensor
external_task_id
in your sensor(s), beware of pitfalls like thisUpvotes: 1