Nihad Azimli
Nihad Azimli

Reputation: 121

Airflow webserver suddenly stopped starting

My airflow webserver suddenly stopped starting. When I try to start webserver it does not come up with UI.

I tried reseting db as airflow resetdb and airflow initdb restarting all the services. Downgrading Gunicorn and upgrading it again. Restarting my linux machine, however, nothing has changed.

Logs of webserver is following:

[2019-05-17 08:08:00 +0000] [14978] [INFO] Starting gunicorn 19.9.0
[2019-05-17 08:08:00 +0000] [14978] [INFO] Listening at: http://0.0.0.0:8081 (14978)
[2019-05-17 08:08:00 +0000] [14978] [INFO] Using worker: sync
[2019-05-17 08:08:00 +0000] [14983] [INFO] Booting worker with pid: 14983
[2019-05-17 08:08:00 +0000] [14984] [INFO] Booting worker with pid: 14984
[2019-05-17 08:08:00 +0000] [14985] [INFO] Booting worker with pid: 14985
[2019-05-17 08:08:00 +0000] [14986] [INFO] Booting worker with pid: 14986
[2019-05-17 08:08:02,179] {__init__.py:51} INFO - Using executor LocalExecutor
[2019-05-17 08:08:02,279] {__init__.py:51} INFO - Using executor LocalExecutor
[2019-05-17 08:08:02,324] {__init__.py:51} INFO - Using executor LocalExecutor
[2019-05-17 08:08:02,342] {models.py:273} INFO - Filling up the DagBag from /root/airflow/dags
[2019-05-17 08:08:02,376] {__init__.py:51} INFO - Using executor LocalExecutor
[2019-05-17 08:08:02,435] {models.py:273} INFO - Filling up the DagBag from /root/airflow/dags
[2019-05-17 08:08:02,521] {models.py:273} INFO - Filling up the DagBag from /root/airflow/dags
[2019-05-17 08:08:02,524] {models.py:273} INFO - Filling up the DagBag from /root/airflow/dags
[2019-05-17 08:10:00 +0000] [14978] [CRITICAL] WORKER TIMEOUT (pid:14984)
[2019-05-17 08:10:00 +0000] [14978] [CRITICAL] WORKER TIMEOUT (pid:14985)
[2019-05-17 08:10:00 +0000] [14978] [CRITICAL] WORKER TIMEOUT (pid:14986)
[2019-05-17 08:10:00 +0000] [14978] [CRITICAL] WORKER TIMEOUT (pid:14983)
[2019-05-17 08:10:01 +0000] [15161] [INFO] Booting worker with pid: 15161
[2019-05-17 08:10:01 +0000] [15164] [INFO] Booting worker with pid: 15164
[2019-05-17 08:10:01 +0000] [15167] [INFO] Booting worker with pid: 15167
[2019-05-17 08:10:01 +0000] [15168] [INFO] Booting worker with pid: 15168
[2019-05-17 08:10:03,953] {__init__.py:51} INFO - Using executor LocalExecutor
[2019-05-17 08:10:04,007] {__init__.py:51} INFO - Using executor LocalExecutor
[2019-05-17 08:10:04,020] {__init__.py:51} INFO - Using executor LocalExecutor
[2019-05-17 08:10:04,036] {__init__.py:51} INFO - Using executor LocalExecutor

Is there anyone who encountered same problem? or Do you have any suggestions?

Upvotes: 7

Views: 7681

Answers (5)

dlamblin
dlamblin

Reputation: 45321

If you have version 1.10.7 and above try to turn on the DAG Serialization option in both the webserver and scheduler. This will stop the Webservers from parsing DAG files. The scheduler's parser is a subprocess with a timeout per file. You should still find the python file that is probably using a network resource outside a task execution context, and fix that… but this can help.

Alternatively move the dag folder to another named directory, create an empty directory, make the original dag folder directory a symbolic link to the empty one. Start your webserver(s). Now change the symbolic link to your dag directory again.

Personally I don't know why the DAG Bag has to be populated BEFORE responding to any web requests. Why can building it be a background process?

Upvotes: 0

Zach
Zach

Reputation: 958

In my case, some of my DAGs have lots of Fargate tasks (I used the ECSOperator), each one of these tasks is using a subroutine I wrote to obtain the subnet ids and security group ids by using boto3's describe_subnets and describe_security_groups. Apparently these calls have crashed the server.

Upvotes: 0

Shahbaz Ali
Shahbaz Ali

Reputation: 1410

I faced the same issue today, airflow webserver stopped starting. I tried a lot but was not able to determine the cause of the issue nothing worked neither resetdb nor upgradedb also reinstalling didn't work. Then I simply commented the whole code inside of my dags and manually created a .pyc file of the dags in dag folder. airflow started working again. I observed that the issue was with the dags. when I removed the dags server started functioning normally. so my advice to anyone who is facing this issue is please check your dags there is definitely something wrong within them. don't blame airflow, sometimes our own code messes with the system.

Upvotes: 4

karthikeayan
karthikeayan

Reputation: 5000

In my case, one of my DAG connects to MySQL database via SSH tunnel, when I connect directly to MySQL it works, but via SSH tunnel it fails. Not sure why, but now I moved to direct connection to MySQL from DAG.

Webserver was not starting gunicorn because DAG was not able to connect to MySQL.

Upvotes: 0

Yoav Gaudin
Yoav Gaudin

Reputation: 77

This is a possible solution that worked for me.

Make sure the dags_folder doesn't contain any files that are not relevant to your dags definitions and configurations.

The Airflow webserver scans periodically the dag_folder, and I found that if this folder is very large the scans causes the server to stall.

Hope this helps you :)

Upvotes: 2

Related Questions