some user
some user

Reputation: 337

Print succeeds but logging module throws exception

I'm trying to log the contents of a file, but I get some funny behavior from the logging module (and not only that one).

Here is the file contents:

"Testing …"
Testing å¨'æøöä
"Testing å¨'æøöä"

And here is how I open and log it:

with codecs.open(f, "r", encoding="utf-8") as myfile:
        script = myfile.read()
        log.debug("Script type: {}".format(type(script)))
        print(script)
        log.debug("{}".format(script.encode("utf8")))

The line where I log the type of the object shows up as follows in my logs:

Script type: <type 'unicode'>

Then the print ... line prints the contents correctly to console, but, the logging module throws an exception:

Traceback (most recent call last):
  File "/usr/lib/python2.7/logging/__init__.py", line 882, in emit
    stream.write(fs % msg.encode("UTF-8"))
UnicodeDecodeError: 'ascii' codec can't decode byte 0xe2 in position 63: ordinal not in range(128)

When I remove the .encode("utf8") bit from that last line, I get the expected exception:

'ascii' codec can't encode character u'\u2026' in position 9: ordinal not in range(128)

This is just to demonstrate the problem. It's not only the logging module. Rest of my code also throws similar exceptions when dealing with this "unicode" string.

What am I doing wrong?

Upvotes: 0

Views: 888

Answers (1)

Martijn Pieters
Martijn Pieters

Reputation: 1121744

Logging handles Unicode values just fine:

>>> import logging
>>> logging.basicConfig(level=logging.DEBUG)
>>> script = u'"Testing …"'
>>> logging.debug(script)
DEBUG:root:"Testing …"

(Writing to a log file will result in UTF-8 encoded messages).

Where you went wrong is by mixing byte strings and Unicode values while using str.format():

>>> "{}".format(script)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
UnicodeEncodeError: 'ascii' codec can't encode character u'\u2026' in position 9: ordinal not in range(128)

If you used a unicode format string you avoid the forced implicit encoding:

>>> u"{}".format(script)
u'"Testing \u2026"'

Upvotes: 1

Related Questions