Rory Wallis
Rory Wallis

Reputation: 31

Apache Airflow keeps repeating DAG

I've been following the steps laid out here https://airflow.apache.org/docs/apache-airflow/stable/start/docker.html to get Apache Airflow set up in Docker. Once set up, I've noticed that the DAG seems to run on a loop without being explicitly told to. The reason I know this is because I've set Python code up to trigger email alerts to me if a given task fails and I'm getting alerts constantly, even when the DAG isn't running in the interface (literally multiple times per minute as long as the server is spun up). I also have CSVs that should update once the steps run and they seem to be refreshing spontaneously (i.e. I can delete the files from the folder and they will appear again soon after without me doing anything).

Any suggestions would be appreciated!

Upvotes: 2

Views: 916

Answers (2)

Rory Wallis
Rory Wallis

Reputation: 31

This is how I have set it up:

#Set default arguments
default_args={
    'owner': 'airflow',
    'depends_on_past': False,
    'start_date': datetime(2021,5,27),
    'retries': 0,
    'email': ["[email protected]"],
    'email_on_failure': True
    }


#Initialise the DAG
dag = DAG(dag_id="AIM_Pipeline", default_args=default_args, schedule_interval=None, 
          catchup=False)

Upvotes: 1

blackraven
blackraven

Reputation: 5637

The DAG should be specified like this

DAG_ID = 'dag_name_here'
start_date = datetime(2020, 11, 30)
default_args = {'owner': 'airflow',
                'depends_on_past': False,
                'retries': 2,
                'retry_delay': timedelta(minutes=1),
                'start_date': start_date
                }
dag = DAG(dag_id=DAG_ID,
            default_args=default_args,
            schedule_interval = blah   #see options below
            )

To run once every minute: schedule_interval = '* * * * *'

To run once every day at 09:00am: schedule_interval = '00 09 * * *'

To run on manual trigger: schedule_interval = None

Upvotes: 0

Related Questions