ws6079
ws6079

Reputation: 343

Python default character encoding handling

I've seen several post related to this, but no clear answer. Let's say I want to print the string s=u'\xe9\xe1' in a terminal which only supports ASCII (e.g., LC_ALL=C; python3). Is there any way to configure the following as default behaviour:

import sys
s = u'\xe9\xe1'
s = s.encode(sys.stdout.encoding, 'replace').decode(sys.stdout.encoding)
print(s)

I.e., I want to the string to print something - even garbage - rather than raising an exception (UnicodeEncodeError). I'm using python3.5.

I would like to avoid writing this for all of my strings which may contain UTF-8.

Upvotes: 0

Views: 304

Answers (1)

Martijn Pieters
Martijn Pieters

Reputation: 1121266

You can do one of three things:

  • Adjust the error handler for stdout and stderr with the PYTHONIOENCODING environment variable:

    export PYTHONIOENCODING=:replace
    

    note the :; I didn't specify the codec, only the error handler.

  • Replace the stdout TextIOWrapper, setting a different error handler:

    import sys
    import io
    
    sys.stdout = io.TextIOWrapper(
        sys.stdout.buffer, encoding=sys.stdout.encoding, 
        errors='replace',
        line_buffering=sys.stdout.line_buffering)
    
  • Create a separate TextIOWrapper instance around sys.stdout.buffer and pass that in as the file argument when printing:

    import sys
    import io
    
    replacing_stdout = io.TextIOWrapper(
        sys.stdout.buffer, encoding=sys.stdout.encoding, 
        errors='replace',
        line_buffering=sys.stdout.line_buffering)
    
    print(s, file=replacing_stdout)
    

Upvotes: 1

Related Questions