Reputation: 31
I've been following the steps laid out here https://airflow.apache.org/docs/apache-airflow/stable/start/docker.html to get Apache Airflow set up in Docker. Once set up, I've noticed that the DAG seems to run on a loop without being explicitly told to. The reason I know this is because I've set Python code up to trigger email alerts to me if a given task fails and I'm getting alerts constantly, even when the DAG isn't running in the interface (literally multiple times per minute as long as the server is spun up). I also have CSVs that should update once the steps run and they seem to be refreshing spontaneously (i.e. I can delete the files from the folder and they will appear again soon after without me doing anything).
Any suggestions would be appreciated!
Upvotes: 2
Views: 916
Reputation: 31
This is how I have set it up:
#Set default arguments
default_args={
'owner': 'airflow',
'depends_on_past': False,
'start_date': datetime(2021,5,27),
'retries': 0,
'email': ["[email protected]"],
'email_on_failure': True
}
#Initialise the DAG
dag = DAG(dag_id="AIM_Pipeline", default_args=default_args, schedule_interval=None,
catchup=False)
Upvotes: 1
Reputation: 5637
The DAG should be specified like this
DAG_ID = 'dag_name_here'
start_date = datetime(2020, 11, 30)
default_args = {'owner': 'airflow',
'depends_on_past': False,
'retries': 2,
'retry_delay': timedelta(minutes=1),
'start_date': start_date
}
dag = DAG(dag_id=DAG_ID,
default_args=default_args,
schedule_interval = blah #see options below
)
To run once every minute: schedule_interval = '* * * * *'
To run once every day at 09:00am: schedule_interval = '00 09 * * *'
To run on manual trigger: schedule_interval = None
Upvotes: 0