codninja0908
codninja0908

Reputation: 567

Facing task Timeout Error while dag parsing in Airflow version 2.2.5

I am hitting the task timeout error with Airflow Version 2.2.5/Composer 2.0.15. The same code is running absolutely fine in Airflow version2.2.3 /Composer Version 1.18.0

Error Message :

Broken DAG: [/home/airflow/gcs/dags/test_dag.py] Traceback (most recent call last):
  File "/opt/python3.8/lib/python3.8/enum.py", line 256, in __new__
    if canonical_member._value_ == enum_member._value_:
  File "/opt/python3.8/lib/python3.8/site-packages/airflow/utils/timeout.py", line 37, in handle_timeout
    raise AirflowTaskTimeout(self.error_message)
airflow.exceptions.AirflowTaskTimeout: DagBag import timeout for /home/airflow/gcs/dags/test_dag.py after 30.0s.
Please take a look at these docs to improve your DAG import time:
* https://airflow.apache.org/docs/apache-airflow/2.2.5/best-practices.html#top-level-python-code
* https://airflow.apache.org/docs/apache-airflow/2.2.5/best-practices.html#reducing-dag-complexity, PID: 1827

As per the documentation or the links in error message about Top Level Python code. We have a framework in place for Dags and tasks.

main_folder

|___ dags

|___ tasks

|___ libs

a) All the main dag files are in dags folder

b) Actual functions or queries (from PythonOperator functions/ Sql Queries) are placed in *.py files under tasks folder

c) Common functionalities are placed in python files in libs folder.

Providing basic dag structure here:

# Import libraries and functions 
import datetime

from airflow import models, DAG
from airflow.contrib.operators import bigquery_operator, bigquery_to_gcs, bigquery_table_delete_operator
from airflow.operators.python_operator import PythonOperator
from airflow.operators.bash_operator import BashOperator
##from airflow.executors.sequential_executor import SequentialExecutor
from airflow.utils.task_group import TaskGroup

## Import codes from tasks and libs folder
from libs.compres_suppress.cot_suppress import *
from libs.teams_plugin.teams_plugin import *
from tasks.email_code.trigger_email import *

# Set up Airflow DAG
default_args = {
    'owner': 'airflow',
    'depends_on_past': False,
    'start_date': datetime.datetime(2020, 12, 15, 0),
    'retries': 1,
    'retry_delay': datetime.timedelta(minutes=1),
    'on_failure_callback': trigger_email
}

DAG_ID = 'test_dag'


# Check exscution date
if "<some condition>" matches:
    run_date = <date in config file>
else:
    run_date = datetime.datetime.now().strftime("%Y-%m-%d")
    run_date_day = datetime.datetime.now().isoweekday()


dag = DAG(
    DAG_ID,
    default_args=default_args, catchup=False,
    max_active_runs=1, schedule_interval=SCHEDULE_INTERVAL
)


next_dag_name = "next_dag1"
if env == "prod":
    if run_date_day == 7:
        next_dag_name = "next_dag2"
    else:
        next_dag_name = "next_dag1"


run_id = datetime.datetime.now().strftime("%Y-%m-%dT%H:%M:%S")

# Define Airflow DAG
with dag:

    team_notify_task = MSTeamsWebhookOperator(
        task_id='teams_notifi_start_task',
        http_conn_id='http_conn_id',
        message=f"DAG has started <br />"
                f"<strong> DAG ID:</strong> {DAG_ID}.<br />",
        theme_color="00FF00",
        button_text="My button",
        dag=dag)
        
    task1_bq = bigquery_operator.BigQueryOperator(
        task_id='task1',
        sql=task1_query(
            table1="table1",
            start_date=start_date),
        use_legacy_sql=False,
        destination_dataset_table="destination_tbl_name",
        write_disposition='WRITE_TRUNCATE'      
    )

    ##### Base Skeletons #####
    with TaskGroup("taskgroup_lbl", tooltip="taskgroup_sample") as task_grp:
         tg_process(args=default_args,run_date=run_date)
         

    if run_mode == "<env_name>" and next_dag != "":
        next_dag_trigg = BashOperator(
            task_id=f'trigger_{next_dag_name}',
            bash_command="gcloud composer environments run " + <env> + "-cust_comp --location us-east1 dags trigger -- " + next_dag_name + " --run-id='trigger_ "'"
        )
        task_grp >> next_dag_trigger
        
    team_notify_task >> task1_bq >> task_grp 
    enter code here

Can someone help on this on what is causing the issue?

Upvotes: 0

Views: 7218

Answers (1)

codninja0908
codninja0908

Reputation: 567

Increasing the dag/task timeout time does the trick.

Go to Airflow (Web UI), On the top bar navigate to

Variables--> Configuration --> [core] --> dagbag_import_timeout = <changed from 30(default) to 160>.

If using Composer, the same can be done through following steps.

a) Go to Composer service and select the composer to which the settings are to be modified.

b) Click on AIRFLOW CONFIGURATION OVERRIDES --> EDIT --> (add/edit) dagbag_import_timeout=160

c) Click on save

Upvotes: 3

Related Questions