Celery worker's log contains question marks (???) instead of correct unicode characters

Question

I'm using Celery 3.1.18 with Python 2.7.8 on CentOS 6.5.

In a Celery task module, I have the following code:

# someapp/tasks.py
from celery import shared_task
from celery.utils.log import get_task_logger

logger = get_task_logger(__name__)


@shared_task()
def foo():
    logger.info('Test output: %s', u"测试中")

I use the initd script here to run a Celery worker. Also I put the following settings in /etc/default/celeryd:

CELERYD_NODES="bar"

# %N will be replaced with the first part of the nodename.
CELERYD_LOG_FILE="/var/log/celery/%N.log"

# Workers should run as an unprivileged user.
#   You need to create this user manually (or you can choose
#   a user/group combination that already exists, e.g. nobody).
CELERYD_USER="nobody"
CELERYD_GROUP="nobody"

So my log file is located in /var/log/celery/bar.log.

However, once the task is executed by the worker, the above log file shows:

[2015-05-07 03:51:14,438: INFO/Worker-1/someapp.tasks.foo(...)] Test output: ???

The unicode characters are gone, replaced with a number of question marks.

How can I get back the unicode characters in the log file?

WKPlus · Accepted Answer

You need to set the LANG=zh_CN.UTF-8 in the environment in which you startup your celery application.

If you are using the celeryd, there is a simple way, set CELERY_BIN="env LANG=zh_CN.UTF-8 /path/to/celery/binary in /etc/default/celeryd

Explanation:

Celery uses ColorFormatter for message formatting, which is defined in celery.utils.log.
ColorFormatter converts unicode to str with kombu.utils.encoding.safe_str.
kombu.utils.encoding.safe_str encodes unicode to str with encoding returns by default_encoding defined in kombu.utils.encoding
default_encoding returns getattr(get_default_encoding_file(), 'encoding', None) or sys.getfilesystemencoding()
Besides, I did not find celery set encoding explicitly, so I thought celery is use sys.getfilesystemencoding() as encoding for convert unicode to str.
sys.getfilesystemencoding's manual says that:

On Unix, the encoding is the user’s preference according to the result of nl_langinfo(CODESET), or None if the nl_langinfo(CODESET) failed
So, setting LANG=zh_CN.UTF8 in the celery process environment tells celery to convert unicode to str by UTF8.

Celery worker's log contains question marks (???) instead of correct unicode characters

Answers (1)

Related Questions

Celery worker&#39;s log contains question marks (???) instead of correct unicode characters

Answers (1)

Related Questions

Celery worker's log contains question marks (???) instead of correct unicode characters