aeapen
aeapen

Reputation: 923

How to Trigger a DAG on the success of a another DAG in Airflow using Python?

I have a python DAG Parent Job and DAG Child Job. The tasks in the Child Job should be triggered on the successful completion of the Parent Job tasks which are run daily. How can add external job trigger ?

MY CODE

from datetime import datetime, timedelta
from airflow import DAG
from airflow.operators.postgres_operator import PostgresOperator
from utils import FAILURE_EMAILS

yesterday = datetime.combine(datetime.today() - timedelta(1), datetime.min.time())


default_args = {
    'owner': 'airflow',
    'depends_on_past': False,
    'start_date': yesterday,
    'email': FAILURE_EMAILS,
    'email_on_failure': False,
    'email_on_retry': False,
    'retries': 1,
    'retry_delay': timedelta(minutes=5)
}

dag = DAG('Child Job', default_args=default_args, schedule_interval='@daily')

execute_notebook = PostgresOperator(
  task_id='data_sql',
  postgres_conn_id='REDSHIFT_CONN',
  sql="SELECT * FROM athena_rs.shipments limit 5",
  dag=dag
)

Upvotes: 28

Views: 65230

Answers (4)

eldrly
eldrly

Reputation: 166

This is based on @y2k-shubham's answer. Some imports and params in Airflow changed since that was written. Unclear which version the changes occurred in because Airflow doesn't include 'added in version' in their function docs. I'm using 2.10

I've used the context manager interface.

### In parent_dag.py ###

from airflow.operators.trigger_dagrun import TriggerDagRunOperator # changed import
from airflow.utils.trigger_rule import TriggerRule
with airflow.operators.empty import EmptyOperator

parent_dag = DAG(
    dag_id="parent_dag_id",
    schedule_interval= None
)

with parent_dag:

    # An arbitrary task in your parent DAG
    started = EmptyOperator(task_id = "parent_start")

    # Trigger pointed at your child DAG
    trigger_task = TriggerDagRunOperator(
        trigger_dag_id = 'child_dag_id', # no longer `external_dag_id`
        task_id = 'trigger_task',
        trigger_rule = TriggerRule.ALL_SUCCESS,
    )
    
    started >> trigger_task

I prefer this to the ExternalTaskSensor approach. We can maintain only 1 schedule, instead of 2. We don't need to build a margin between the parent schedule and child schedule or wait for polling retries. Therefore the whole pipeline will complete sooner.

Upvotes: 1

y2k-shubham
y2k-shubham

Reputation: 11627

As requested by @pankaj, I'm hereby adding a snippet depicting reactive-triggering using TriggerDagRunOperator (as opposed to poll-based triggering of ExternalTaskSensor)

from typing import List

from airflow.models.baseoperator import BaseOperator
from airflow.models.dag import DAG
from airflow.operators.dagrun_operator import TriggerDagRunOperator
from airflow.utils.trigger_rule import TriggerRule

# DAG object
my_dag: DAG = DAG(dag_id='my_dag',
                  start_date=..)
..
# a list of 'tail' tasks: tasks that have no downstream tasks
tail_tasks_of_first_dag: List[BaseOperator] = my_magic_function_that_determines_all_tail_tasks(..)
..

# our trigger task
my_trigger_task: TriggerDagRunOperator = TriggerDagRunOperator(dag=my_dag,
                                                               task_id='my_trigger_task',
                                                               trigger_rule=TriggerRule.ALL_SUCCESS,
                                                               external_dag_id='id_of_dag_to_be_triggered')
# our trigger task should run when all 'tail' tasks have completed / succeeded
tail_tasks_of_first_dag >> my_trigger_task

Note that snippet is for reference purpose only; it has NOT been tested


Points to note / References

Upvotes: 14

moon
moon

Reputation: 1832

Answer is in this thread already. Below is demo code:

Parent dag:

from datetime import datetime
from airflow import DAG
from airflow.operators.dummy_operator import DummyOperator

default_args = {
    'owner': 'airflow',
    'depends_on_past': False,
    'start_date': datetime(2020, 4, 29),
}

dag = DAG('Parent_dag', default_args=default_args, schedule_interval='@daily')

leave_work = DummyOperator(
    task_id='leave_work',
    dag=dag,
)
cook_dinner = DummyOperator(
    task_id='cook_dinner',
    dag=dag,
)

leave_work >> cook_dinner

Child dag:

from datetime import datetime, timedelta
from airflow import DAG
from airflow.operators.dummy_operator import DummyOperator
from airflow.operators.sensors import ExternalTaskSensor

default_args = {
    'owner': 'airflow',
    'depends_on_past': False,
    'start_date': datetime(2020, 4, 29),
}

dag = DAG('Child_dag', default_args=default_args, schedule_interval='@daily')

# Use ExternalTaskSensor to listen to the Parent_dag and cook_dinner task
# when cook_dinner is finished, Child_dag will be triggered
wait_for_dinner = ExternalTaskSensor(
    task_id='wait_for_dinner',
    external_dag_id='Parent_dag',
    external_task_id='cook_dinner',
    start_date=datetime(2020, 4, 29),
    execution_delta=timedelta(hours=1),
    timeout=3600,
)

have_dinner = DummyOperator(
    task_id='have_dinner',
    dag=dag,
)
play_with_food = DummyOperator(
    task_id='play_with_food',
    dag=dag,
)

wait_for_dinner >> have_dinner
wait_for_dinner >> play_with_food

Images:

Dags

Dags

Parent_dag

Parent_dag

Child_dag

Child_dag

Upvotes: 40

Bernardo stearns reisen
Bernardo stearns reisen

Reputation: 2657

I believe you are looking for SubDags operator, running a Dag in a bigger dag. Note that creating many subdags like in the example below gets messy pretty quick, so I recommend splitting each subdag in a file and importing then in a main file.

The SubDagOperator is simple to use you need to give an Id, a subdag (the child) and a dag(the parent)

subdag_2 = SubDagOperator(
        task_id="just_some_id", 
        subdag=child_subdag, <---- this must be a DAG
        dag=parent_dag, <----- this must be a DAG
        )

It will look like this: this

From their examples repo

from airflow import DAG
from airflow.example_dags.subdags.subdag import subdag
from airflow.operators.dummy_operator import DummyOperator
from airflow.operators.subdag_operator import SubDagOperator
from airflow.utils.dates import days_ago
from airflow import DAG
from airflow.operators.dummy_operator import DummyOperator
def subdag(parent_dag_name, child_dag_name, args):
    dag_subdag = DAG(
            dag_id='%s.%s' % (parent_dag_name, child_dag_name),
            default_args=args,
            schedule_interval="@daily",
            )

    for i in range(5):
        DummyOperator(
                task_id='%s-task-%s' % (child_dag_name, i + 1),
                default_args=args,
                dag=dag_subdag,
                )

    return dag_subdag

DAG_NAME = 'example_subdag_operator'

args = {
        'owner': 'airflow',
        'start_date': days_ago(2),
        }

dag = DAG(
        dag_id=DAG_NAME,
        default_args=args,
        schedule_interval="@once",
        tags=['example']
        )

start = DummyOperator(
        task_id='start-of-main-job',
        dag=dag,
        )

some_other_task = DummyOperator(
        task_id='some-other-task',
        dag=dag,
        )


end = DummyOperator(
        task_id='end-of-main-job',
        dag=dag,
        )

subdag = SubDagOperator(
        task_id='run-this-dag-after-previous-steps',
        subdag=subdag(DAG_NAME, 'run-this-dag-after-previous-steps', args),
        dag=dag,
        )

start >> some_other_task >> end >> subdag

Upvotes: 5

Related Questions