searain
searain

Reputation: 3301

In Airflow Data-aware scheduling, how could I do ""when one of the datasets in the schedule list is updated, then the DAG will be scheduled"?

According to Airflow document:

https://airflow.apache.org/docs/apache-airflow/2.4.0/concepts/datasets.html#multiple-datasets

Multiple Datasets As the schedule parameter is a list, DAGs can require multiple datasets, and the DAG will be scheduled once all datasets it consumes have been updated at least once since the last time it was run:

with DAG(
dag_id='multiple_datasets_example',
   schedule=[
       example_dataset_1,
       example_dataset_2,
       example_dataset_3,
   ],
   ...,
):

All the datasets in the schedule updated, then this consumer dag will be scheduled. This is the behavior of this dag.

What if I want "when one of the datasets in the schedule list is updated, then the DAG will be scheduled.", I could not use this approach, I would have to use the more traditional TriggerDagRunOperator?

Upvotes: 0

Views: 1228

Answers (1)

TJaniF
TJaniF

Reputation: 1046

Yes, you are correct that as of now (Airflow 2.5.0) if you provide more than one dataset to the schedule parameter the DAG will wait for all of them to be updated before running. Having more configuration around this behavior is something that is being discussed, I found a reference to it on AIP48 in the future work section (second to last paragraph). But yes for now you'd have to use the TriggerDagRunOperator.

Upvotes: 2

Related Questions