Abraham Tom
Abraham Tom

Reputation: 41

Airflow dag bash task lag on remote executions

I am experimenting with Airflow to replace our existing cron orchestration and everything looks promising. I have successfully installed and gotten a dag to be scheduled and executed, but I noticed that their is a significant delay between each of the tasks I have specified (at least 15 minutes to 60 minutes).

My dag is defined as follows

Am I missing something to make them run one right after the other?

I am not using celery both scheduler and webserver are running on the same host and yes - need to call for a remote execution (working on some form of local until then) and no cannot install airflow on the remote server Dag should run once a day at 1 am UTC, follow the set path of tasks I have given it.

import airflow
from builtins import range
from airflow import DAG
from airflow.operators.bash_operator import BashOperator
from airflow.models import DAG
from datetime import datetime, timedelta

args = {
    'owner': 'user1',
    'depends_on_past': False,
    'start_date': airflow.utils.dates.days_ago(2),
    'email': ['[email protected]'],
    'email_on_failure': True,
    'email_on_retry': False,
    'wait_for_downstream': True,
    'schedule_interval': None,
    'depends_on_past': True,
    'retries': 1,
    'retry_delay': timedelta(minutes=5)
}

 dag = DAG(
       dag_id='airflow_pt1'
     , default_args=args
     , schedule_interval='0 1 * * *'
     , dagrun_timeout=timedelta(hours=8))

 task1 = BashOperator(
       task_id='task1'
     , bash_command='ssh user1@remoteserver /path/to/remote/execution/script_task1.sh'
     , dag=dag,env=None, output_encoding='utf-8')

 task2 = BashOperator(
       task_id='task2'
     , bash_command='ssh user1@remoteserver /path/to/remote/execution/script_task2.sh'
     , dag=dag,env=None, output_encoding='utf-8')

 task3 = BashOperator(
       task_id='task3'
     , bash_command='ssh user1@remoteserver /path/to/remote/execution/script_task3.sh'
     , dag=dag,env=None, output_encoding='utf-8')

 task4 = BashOperator(
       task_id='task4'
     , bash_command='ssh user1@remoteserver /path/to/remote/execution/script_task4.sh'
     , dag=dag,env=None, output_encoding='utf-8')

 task2.set_upstream(task1)
 task3.set_upstream(task1)
 task4.set_upstream(task2)

Note I have not executed airflow backfill (is that important?)

Upvotes: 0

Views: 1627

Answers (1)

Abraham Tom
Abraham Tom

Reputation: 41

Found the issue I had not altered the configuration from sequential to localExecutor in airflow.cfg file

I found my answer through https://stlong0521.github.io/20161023%20-%20Airflow.html

and watching the detailed video in https://www.youtube.com/watch?v=Pr0FrvIIfTU

Upvotes: 1

Related Questions