Reputation: 923
I have a python DAG Parent Job
and DAG Child Job
. The tasks in the Child Job
should be triggered on the successful completion of the Parent Job
tasks which are run daily. How can add external job trigger ?
MY CODE
from datetime import datetime, timedelta
from airflow import DAG
from airflow.operators.postgres_operator import PostgresOperator
from utils import FAILURE_EMAILS
yesterday = datetime.combine(datetime.today() - timedelta(1), datetime.min.time())
default_args = {
'owner': 'airflow',
'depends_on_past': False,
'start_date': yesterday,
'email': FAILURE_EMAILS,
'email_on_failure': False,
'email_on_retry': False,
'retries': 1,
'retry_delay': timedelta(minutes=5)
}
dag = DAG('Child Job', default_args=default_args, schedule_interval='@daily')
execute_notebook = PostgresOperator(
task_id='data_sql',
postgres_conn_id='REDSHIFT_CONN',
sql="SELECT * FROM athena_rs.shipments limit 5",
dag=dag
)
Upvotes: 28
Views: 65230
Reputation: 166
This is based on @y2k-shubham's answer. Some imports and params in Airflow changed since that was written. Unclear which version the changes occurred in because Airflow doesn't include 'added in version' in their function docs. I'm using 2.10
I've used the context manager interface.
### In parent_dag.py ###
from airflow.operators.trigger_dagrun import TriggerDagRunOperator # changed import
from airflow.utils.trigger_rule import TriggerRule
with airflow.operators.empty import EmptyOperator
parent_dag = DAG(
dag_id="parent_dag_id",
schedule_interval= None
)
with parent_dag:
# An arbitrary task in your parent DAG
started = EmptyOperator(task_id = "parent_start")
# Trigger pointed at your child DAG
trigger_task = TriggerDagRunOperator(
trigger_dag_id = 'child_dag_id', # no longer `external_dag_id`
task_id = 'trigger_task',
trigger_rule = TriggerRule.ALL_SUCCESS,
)
started >> trigger_task
I prefer this to the ExternalTaskSensor
approach. We can maintain only 1 schedule, instead of 2. We don't need to build a margin between the parent schedule and child schedule or wait for polling retries. Therefore the whole pipeline will complete sooner.
Upvotes: 1
Reputation: 11627
As requested by @pankaj, I'm hereby adding a snippet depicting reactive-triggering using TriggerDagRunOperator
(as opposed to poll-based triggering of ExternalTaskSensor
)
from typing import List
from airflow.models.baseoperator import BaseOperator
from airflow.models.dag import DAG
from airflow.operators.dagrun_operator import TriggerDagRunOperator
from airflow.utils.trigger_rule import TriggerRule
# DAG object
my_dag: DAG = DAG(dag_id='my_dag',
start_date=..)
..
# a list of 'tail' tasks: tasks that have no downstream tasks
tail_tasks_of_first_dag: List[BaseOperator] = my_magic_function_that_determines_all_tail_tasks(..)
..
# our trigger task
my_trigger_task: TriggerDagRunOperator = TriggerDagRunOperator(dag=my_dag,
task_id='my_trigger_task',
trigger_rule=TriggerRule.ALL_SUCCESS,
external_dag_id='id_of_dag_to_be_triggered')
# our trigger task should run when all 'tail' tasks have completed / succeeded
tail_tasks_of_first_dag >> my_trigger_task
Note that snippet is for reference purpose only; it has NOT been tested
Points to note / References
Upvotes: 14
Reputation: 1832
Answer is in this thread already. Below is demo code:
Parent dag:
from datetime import datetime
from airflow import DAG
from airflow.operators.dummy_operator import DummyOperator
default_args = {
'owner': 'airflow',
'depends_on_past': False,
'start_date': datetime(2020, 4, 29),
}
dag = DAG('Parent_dag', default_args=default_args, schedule_interval='@daily')
leave_work = DummyOperator(
task_id='leave_work',
dag=dag,
)
cook_dinner = DummyOperator(
task_id='cook_dinner',
dag=dag,
)
leave_work >> cook_dinner
Child dag:
from datetime import datetime, timedelta
from airflow import DAG
from airflow.operators.dummy_operator import DummyOperator
from airflow.operators.sensors import ExternalTaskSensor
default_args = {
'owner': 'airflow',
'depends_on_past': False,
'start_date': datetime(2020, 4, 29),
}
dag = DAG('Child_dag', default_args=default_args, schedule_interval='@daily')
# Use ExternalTaskSensor to listen to the Parent_dag and cook_dinner task
# when cook_dinner is finished, Child_dag will be triggered
wait_for_dinner = ExternalTaskSensor(
task_id='wait_for_dinner',
external_dag_id='Parent_dag',
external_task_id='cook_dinner',
start_date=datetime(2020, 4, 29),
execution_delta=timedelta(hours=1),
timeout=3600,
)
have_dinner = DummyOperator(
task_id='have_dinner',
dag=dag,
)
play_with_food = DummyOperator(
task_id='play_with_food',
dag=dag,
)
wait_for_dinner >> have_dinner
wait_for_dinner >> play_with_food
Images:
Dags
Parent_dag
Child_dag
Upvotes: 40
Reputation: 2657
I believe you are looking for SubDags operator, running a Dag in a bigger dag. Note that creating many subdags like in the example below gets messy pretty quick, so I recommend splitting each subdag in a file and importing then in a main file.
The SubDagOperator is simple to use you need to give an Id, a subdag (the child) and a dag(the parent)
subdag_2 = SubDagOperator(
task_id="just_some_id",
subdag=child_subdag, <---- this must be a DAG
dag=parent_dag, <----- this must be a DAG
)
From their examples repo
from airflow import DAG
from airflow.example_dags.subdags.subdag import subdag
from airflow.operators.dummy_operator import DummyOperator
from airflow.operators.subdag_operator import SubDagOperator
from airflow.utils.dates import days_ago
from airflow import DAG
from airflow.operators.dummy_operator import DummyOperator
def subdag(parent_dag_name, child_dag_name, args):
dag_subdag = DAG(
dag_id='%s.%s' % (parent_dag_name, child_dag_name),
default_args=args,
schedule_interval="@daily",
)
for i in range(5):
DummyOperator(
task_id='%s-task-%s' % (child_dag_name, i + 1),
default_args=args,
dag=dag_subdag,
)
return dag_subdag
DAG_NAME = 'example_subdag_operator'
args = {
'owner': 'airflow',
'start_date': days_ago(2),
}
dag = DAG(
dag_id=DAG_NAME,
default_args=args,
schedule_interval="@once",
tags=['example']
)
start = DummyOperator(
task_id='start-of-main-job',
dag=dag,
)
some_other_task = DummyOperator(
task_id='some-other-task',
dag=dag,
)
end = DummyOperator(
task_id='end-of-main-job',
dag=dag,
)
subdag = SubDagOperator(
task_id='run-this-dag-after-previous-steps',
subdag=subdag(DAG_NAME, 'run-this-dag-after-previous-steps', args),
dag=dag,
)
start >> some_other_task >> end >> subdag
Upvotes: 5