Reputation: 121
My airflow webserver suddenly stopped starting. When I try to start webserver it does not come up with UI.
I tried reseting db as airflow resetdb
and airflow initdb
restarting all the services. Downgrading Gunicorn and upgrading it again. Restarting my linux machine, however, nothing has changed.
Logs of webserver is following:
[2019-05-17 08:08:00 +0000] [14978] [INFO] Starting gunicorn 19.9.0
[2019-05-17 08:08:00 +0000] [14978] [INFO] Listening at: http://0.0.0.0:8081 (14978)
[2019-05-17 08:08:00 +0000] [14978] [INFO] Using worker: sync
[2019-05-17 08:08:00 +0000] [14983] [INFO] Booting worker with pid: 14983
[2019-05-17 08:08:00 +0000] [14984] [INFO] Booting worker with pid: 14984
[2019-05-17 08:08:00 +0000] [14985] [INFO] Booting worker with pid: 14985
[2019-05-17 08:08:00 +0000] [14986] [INFO] Booting worker with pid: 14986
[2019-05-17 08:08:02,179] {__init__.py:51} INFO - Using executor LocalExecutor
[2019-05-17 08:08:02,279] {__init__.py:51} INFO - Using executor LocalExecutor
[2019-05-17 08:08:02,324] {__init__.py:51} INFO - Using executor LocalExecutor
[2019-05-17 08:08:02,342] {models.py:273} INFO - Filling up the DagBag from /root/airflow/dags
[2019-05-17 08:08:02,376] {__init__.py:51} INFO - Using executor LocalExecutor
[2019-05-17 08:08:02,435] {models.py:273} INFO - Filling up the DagBag from /root/airflow/dags
[2019-05-17 08:08:02,521] {models.py:273} INFO - Filling up the DagBag from /root/airflow/dags
[2019-05-17 08:08:02,524] {models.py:273} INFO - Filling up the DagBag from /root/airflow/dags
[2019-05-17 08:10:00 +0000] [14978] [CRITICAL] WORKER TIMEOUT (pid:14984)
[2019-05-17 08:10:00 +0000] [14978] [CRITICAL] WORKER TIMEOUT (pid:14985)
[2019-05-17 08:10:00 +0000] [14978] [CRITICAL] WORKER TIMEOUT (pid:14986)
[2019-05-17 08:10:00 +0000] [14978] [CRITICAL] WORKER TIMEOUT (pid:14983)
[2019-05-17 08:10:01 +0000] [15161] [INFO] Booting worker with pid: 15161
[2019-05-17 08:10:01 +0000] [15164] [INFO] Booting worker with pid: 15164
[2019-05-17 08:10:01 +0000] [15167] [INFO] Booting worker with pid: 15167
[2019-05-17 08:10:01 +0000] [15168] [INFO] Booting worker with pid: 15168
[2019-05-17 08:10:03,953] {__init__.py:51} INFO - Using executor LocalExecutor
[2019-05-17 08:10:04,007] {__init__.py:51} INFO - Using executor LocalExecutor
[2019-05-17 08:10:04,020] {__init__.py:51} INFO - Using executor LocalExecutor
[2019-05-17 08:10:04,036] {__init__.py:51} INFO - Using executor LocalExecutor
Is there anyone who encountered same problem? or Do you have any suggestions?
Upvotes: 7
Views: 7681
Reputation: 45321
If you have version 1.10.7 and above try to turn on the DAG Serialization option in both the webserver and scheduler. This will stop the Webservers from parsing DAG files. The scheduler's parser is a subprocess with a timeout per file. You should still find the python file that is probably using a network resource outside a task execution context, and fix that… but this can help.
Alternatively move the dag folder to another named directory, create an empty directory, make the original dag folder directory a symbolic link to the empty one. Start your webserver(s). Now change the symbolic link to your dag directory again.
Personally I don't know why the DAG Bag has to be populated BEFORE responding to any web requests. Why can building it be a background process?
Upvotes: 0
Reputation: 958
In my case, some of my DAGs have lots of Fargate tasks (I used the ECSOperator), each one of these tasks is using a subroutine I wrote to obtain the subnet ids and security group ids by using boto3's describe_subnets
and describe_security_groups
. Apparently these calls have crashed the server.
Upvotes: 0
Reputation: 1410
I faced the same issue today, airflow webserver stopped starting. I tried a lot but was not able to determine the cause of the issue nothing worked neither resetdb nor upgradedb also reinstalling didn't work. Then I simply commented the whole code inside of my dags and manually created a .pyc file of the dags in dag folder. airflow started working again. I observed that the issue was with the dags. when I removed the dags server started functioning normally. so my advice to anyone who is facing this issue is please check your dags there is definitely something wrong within them. don't blame airflow, sometimes our own code messes with the system.
Upvotes: 4
Reputation: 5000
In my case, one of my DAG connects to MySQL database via SSH tunnel, when I connect directly to MySQL it works, but via SSH tunnel it fails. Not sure why, but now I moved to direct connection to MySQL from DAG.
Webserver was not starting gunicorn because DAG was not able to connect to MySQL.
Upvotes: 0
Reputation: 77
This is a possible solution that worked for me.
Make sure the dags_folder
doesn't contain any files that are not relevant to your dags definitions and configurations.
The Airflow webserver scans periodically the dag_folder
, and I found that if this folder is very large the scans causes the server to stall.
Hope this helps you :)
Upvotes: 2