Raven
Raven

Reputation: 41

Airflow task exited with return code 1 without any warning/error message

Apache Airflow version: 1.10.10

Kubernetes version (if you are using kubernetes) (use kubectl version): Not using Kubernetes or docker

Environment: CentOS Linux release 7.7.1908 (Core) Linux 3.10.0-1062.el7.x86_64

Python Version: 3.7.6

Executor: LocalExecutor

What happened:

I write a simple dag to clean airflow logs. Everything is OK when I use 'airflow test' command to test it, I also trigger it manually in WebUI which use 'airflow run' command to start my task, it is still OK.

But after I reboot my server and restart my webserver & scheduler service (in daemon mode), every time I trigger the exactly same dag, it still get scheduled like usual, but exit with code 1 immediately after start a new process to run task.

I also use 'airflow test' command again to check if there is something wrong with my code now, but everything seems OK when using 'airflow test', but exit silently when using 'airflow run', it is really weird.

Here's the task log when it's manually triggered in WebUI ( I've changed the log level to DEBUG, but still can't find anything useful), or you can read the attached log file: task error log.txt

Reading local file: /root/airflow/logs/airflow_log_cleanup/log_cleanup_worker_num_1/2020-04-29T13:51:44.071744+00:00/1.log [2020-04-29 21:51:53,744] {base_task_runner.py:61} DEBUG - Planning to run as the user [2020-04-29 21:51:53,750] {taskinstance.py:686} DEBUG - dependency 'Previous Dagrun State' PASSED: True, The task did not have depends_on_past set. [2020-04-29 21:51:53,754] {taskinstance.py:686} DEBUG - dependency 'Not In Retry Period' PASSED: True, The task instance was not marked for retrying. [2020-04-29 21:51:53,754] {taskinstance.py:686} DEBUG - dependency 'Task Instance State' PASSED: True, Task state queued was valid. [2020-04-29 21:51:53,754] {taskinstance.py:669} INFO - Dependencies all met for [2020-04-29 21:51:53,757] {taskinstance.py:686} DEBUG - dependency 'Previous Dagrun State' PASSED: True, The task did not have depends_on_past set. [2020-04-29 21:51:53,760] {taskinstance.py:686} DEBUG - dependency 'Pool Slots Available' PASSED: True, ('There are enough open slots in %s to execute the task', 'default_pool') [2020-04-29 21:51:53,766] {taskinstance.py:686} DEBUG - dependency 'Not In Retry Period' PASSED: True, The task instance was not marked for retrying. [2020-04-29 21:51:53,768] {taskinstance.py:686} DEBUG - dependency 'Task Concurrency' PASSED: True, Task concurrency is not set. [2020-04-29 21:51:53,768] {taskinstance.py:669} INFO - Dependencies all met for

[2020-04-29 21:51:53,768] {taskinstance.py:879} INFO -

[2020-04-29 21:51:53,768] {taskinstance.py:880} INFO - Starting attempt 1 of 2

[2020-04-29 21:51:53,768] {taskinstance.py:881} INFO -

[2020-04-29 21:51:53,779] {taskinstance.py:900} INFO - Executing on 2020-04-29T13:51:44.071744+00:00 [2020-04-29 21:51:53,781] {standard_task_runner.py:53} INFO - Started process 29718 to run task [2020-04-29 21:51:53,805] {logging_mixin.py:112} INFO - [2020-04-29 21:51:53,805] {cli_action_loggers.py:68} DEBUG - Calling callbacks: [] [2020-04-29 21:51:53,818] {logging_mixin.py:112} INFO - [2020-04-29 21:51:53,817] {cli_action_loggers.py:86} DEBUG - Calling callbacks: [] [2020-04-29 21:51:58,759] {logging_mixin.py:112} INFO - [2020-04-29 21:51:58,759] {base_job.py:200} DEBUG - [heartbeat] [2020-04-29 21:51:58,759] {logging_mixin.py:112} INFO - [2020-04-29 21:51:58,759] {local_task_job.py:124} DEBUG - Time since last heartbeat(0.01 s) < heartrate(5.0 s), sleeping for 4.98824 s [2020-04-29 21:52:03,753] {logging_mixin.py:112} INFO - [2020-04-29 21:52:03,753] {local_task_job.py:103} INFO - Task exited with return code 1

How to reproduce it:

I really don't know how to reproduce it. because it happens suddenly, and seems like permanently??

Anything else we need to know:

I try to figure out the difference between 'airflow test' and 'airflow run', it might have something to do with process fork I guess?

What I've tried to solve this problem but all failed:

Upvotes: 2

Views: 9972

Answers (1)

Raven
Raven

Reputation: 41

I finally figure out how to reproduce this bug.

When you config email in airflow.cfg and your dag contains email operator or use smtp serivce, if your smtp password contains character like "^", the first task of your dag will 100% exited with return code 1 without any error information, in my case the first task is merely a python operator.

Although I think it's my bad to mess up smtp service, there should be some reasonable hints, actually it takes me a whole week to debug this, I have to reset everything in my airflow environment and slowly change configuration to see when does this bug happens.

Hope this information is helpful

Upvotes: 2

Related Questions