Reputation: 307
I'm using Apache Airflow and recognized that the size of the gunicorn-error.log grown over 50 GB within 5 months. Most of the log messages are INFO level logs like:
[2018-05-14 17:31:39 +0000] [29595] [INFO] Handling signal: ttou
[2018-05-14 17:32:37 +0000] [2359] [INFO] Worker exiting (pid: 2359)
[2018-05-14 17:33:07 +0000] [29595] [INFO] Handling signal: ttin
[2018-05-14 17:33:07 +0000] [5758] [INFO] Booting worker with pid:
5758 [2018-05-14 17:33:10 +0000] [29595] [INFO] Handling signal: ttou [2018-05-14 17:33:41 +0000] [2994] [INFO] Worker exiting (pid: 2994)
[2018-05-14 17:34:11 +0000] [29595] [INFO] Handling signal: ttin
[2018-05-14 17:34:11 +0000] [6400] [INFO] Booting worker with pid: 6400 [2018-05-14 17:34:13 +0000] [29595] [INFO] Handling signal: ttou
[2018-05-14 17:34:36 +0000] [3611] [INFO] Worker exiting (pid: 3611)
Within the Airflow config file I'm only able to set the log file path. Does anyone know how to change the gunicorn
logging to another level within Airflow? I do not need this fine grained logging level because it overfills my hard drive.
Upvotes: 3
Views: 2075
Reputation: 799
I managed to solve the problem by setting an environment variable:
GUNICORN_CMD_ARGS="--log-level WARNING"
If setting this in a docker-compose.yml
file, the following is tested with apache-airflow==1.10.6 with gunicorn==19.9.0:
environment:
- 'GUNICORN_CMD_ARGS=--log-level WARNING'
If setting this in a Dockerfile
, the following is tested with apache-airflow==1.10.6 with gunicorn==19.9.0:
ENV GUNICORN_CMD_ARGS --log-level WARNING
Upvotes: 4
Reputation: 719
Logging seems a bit tricky to me in Airflow. One of the reason is that logging is split into several parts. For instance, the logging configuration for Airflow is totally different from the one of gunicorn webserver (the "spam" logs you mention in your messages come from gunicorn).
To solve this Spam problem, I modified a bit the bin/cli.py of Airflow by adding some few lines in the webserver() function:
if args.log_config:
run_args += ['--log-config', str(args.log_config)]
(for the sake of brevity I haven't pasted the code to handle the argument)
And then, as for a log config file I have something similar to:
[loggers]
keys=root, gunicorn.error, gunicorn.access
[handlers]
keys=console, error_file, access_file
[formatters]
keys=generic, access
[logger_root]
level=INFO
handlers=console
[logger_gunicorn.error]
level=INFO
handlers=error_file
propagate=0
qualname=gunicorn.error
[logger_gunicorn.access]
level=INFO
handlers=access_file
propagate=1
qualname=gunicorn.access
[handler_console]
class=StreamHandler
formatter=generic
args=(sys.stdout, )
[handler_error_file]
class=logging.handlers.TimedRotatingFileHandler
formatter=generic
args=('/home/airflow/airflow/logs/webserver/gunicorn.error.log',)
[handler_access_file]
class=logging.handlers.TimedRotatingFileHandler
formatter=access
args=('/home/airflow/airflow/logs/webserver/gunicorn.access.log',)
[formatter_generic]
format=[%(name)s] [%(module)s] [%(asctime)s] {%(filename)s:%(lineno)d} %(levelname)s - %(message)s
#format=[%(levelname)s] %(asctime)s [%(process)d] [%(levelname)s] %(message)s
datefmt=%Y-%m-%d %H:%M:%S
class=logging.Formatter
[formatter_access]
format=%(message)s
class=logging.Formatter
Note the "propagate=0" in gunicorn.error, which avoids the spams in your stdout. You still have them but at least it is localized in /home/airflow/airflow/logs/webserver/gunicorn.error.log , which should be rotated (I haven't fully tested yet the rotation part to be honest).
If I have time, I'll submit this change as a Jira ticket for Airflow.
Upvotes: 1