Newt
Newt

Reputation: 887

Airflow schedule_interval and the active dags run

define the instance for processing the training data

dag = DAG(
    dag_id,
    start_date = datetime(2019, 11, 14),
    description = 'Reading training logs from the corresponding location',
    default_args = default_args,
    schedule_interval = timedelta(hours=1),
)

I have the code like this. So in my opinion, this dag will execute every one hour. But in the airflow web, I got many run days in Schedule part. The day is executing all the time. Especially, in the Tree View part, I could see all the block were filled within one hour!!! I am confused about the schedule_interval function. Any ideas on how to fix that .

Upvotes: 0

Views: 322

Answers (2)

SMDC
SMDC

Reputation: 717

On the FIRST DAG run, it will start on the date you define on start_date. From that point on, the scheduler creates new DagRuns based on your schedule_interval and the corresponding task instances run as your dependencies are met. You can read more about it here .

Upvotes: 1

Newt
Newt

Reputation: 887

I know, it is the problem coming from the non consistent time setting between the really time and start_date. It the start_date is behind the really time, the system will backfill the past time.

Upvotes: 0

Related Questions