lucy
lucy

Reputation: 4506

Hourly run dag in Airflow

My Dag

{
    'owner': 'airflow',
    'start_date': datetime(2020, 1, 10, 7, 1, 00),
    'depends_on_past': False,
    'catchup_by_default': False,
}

dag = DAG('Hourly_test_2', schedule_interval='0 * * * *', default_args=default_args)

It runs every hour, but it shows 1 hour less in tree view graph. Example in tree view graph time show 8AM but the actual time is 9 AM. How to sync both times?

Job should run every hour and hour should match with a current hour in the Tree view.

enter image description here

Upvotes: 1

Views: 10245

Answers (2)

Emma
Emma

Reputation: 9308

This is how airflow schedules. Check this part of the scheduler documentation.

Note that if you run a DAG on a schedule_interval of one day, the run stamped 2016-01-01 will be trigger soon after 2016-01-01T23:59. In other words, the job instance is started once the period it covers has ended.

Let’s Repeat That The scheduler runs your job one schedule_interval AFTER the start date, at the END of the period.

Ref: https://airflow.apache.org/docs/stable/scheduler.html

Upvotes: 2

bxacosta
bxacosta

Reputation: 49

It is not a time synchronization problem, it is due to the start_date and schedule_interval, airflow by default calculates how many times it should have been executed from start_date until the current date and start a DAG Run for any interval that has not been executed check here.

In your case the start date is 7:01 and according to your schedule_interval the execution intervals are 8:00, 9:00, 10:00 ...

This is why there is a DAG Run at 8:00, you can disable this behavior by default by setting the parameter catchup = False in your dag definition.

dag = DAG('Hourly_test_2', catchup=False, schedule_interval='0 * * * *', default_args=default_args)

Upvotes: 2

Related Questions