Why does opening a subprocess with universal_newlines cause a unicode decode exception?

Question

I am using the subprocess module to run a child job, and collecting its output and error streams with subprocess.PIPE's. To avoid deadlock I continually read from those streams on a separate thread. This works, except sometimes the program crashes due to a decoding issue:

`UnicodeDecodeError: 'ascii' codec can't decode byte 0xe2 in position 483: ordinal not in range(128

At a high level, I understand that Python is probably trying to convert to a string using the ASCII codec, and that I need to call decode somewhere, I'm just not sure where. When I create my subprocess job, I specify universal_newlines to be True. I thought this meant, return stdout/stderr as unicode, not binary:

self.p = subprocess.Popen(self.command, shell=self.shell, stdout=subprocess.PIPE, stderr=subprocess.PIPE, universal_newlines=True)

The crash happens in my reading thread function:

def standardOutHandler(standardOut):
    # Crash happens on the following line:
    for line in iter(standardOut.readline, ''):
       writerLock.acquire()
       stdout_file.write(line)
       if self.echoOutput:
           sys.stdout.write(line)
           sys.stdout.flush()
       writerLock.release()

Its not clear why readline is throwing a decoding exception here; as I stated, I thought universal_newlines being true was already returning me decoded data.

What is going on here and what can I do to correct this?

Here is the full traceback

Exception in thread Thread-5:
Traceback (most recent call last):
  File "/Library/Frameworks/Python.framework/Versions/3.4/lib/python3.4/threading.py", line 920, in _bootstrap_inner
self.run()
  File "/Library/Frameworks/Python.framework/Versions/3.4/lib/python3.4/threading.py", line 868, in run
self._target(*self._args, **self._kwargs)
  File "/Users/lzrd/my_process.py", line 61, in standardOutHandler
for line in iter(standardOut.readline, ''):
  File "/Users/lzrd/Envs/my_env/bin/../lib/python3.4/encodings/ascii.py", line 26, in decode
return codecs.ascii_decode(input, self.errors)[0]
UnicodeDecodeError: 'ascii' codec can't decode byte 0xe2 in position 483: ordinal not in range(128)

jfs · Accepted Answer

If you use universal_newlines=True then the byte stream is decoded into Unicode using locale.getpreferredencoding(False) character encoding that should be utf-8 on your system (check LANG, LC_CTYPE, LC_ALL envvars).

If the exception persists; try your code with an empty loop body:

for line in standardOut: #NOTE: no need to use iter() idiom here on Python 3
    pass

if you still get the exception then it might be a bug in Python if locale.getpreferredencoding(False) is not ascii if you check it near Popen() call -- it is important to use exactly the same environment here.

I would understand if UnicodeDecodeError were showing utf-8 instead of ascii. In that case, you could try to decode the stream manually:

#!/usr/bin/env python3
import io
import locale
from subprocess import Popen, PIPE

with Popen(['command', 'arg 1'], stdout=PIPE, bufsize=1) as p:
    for line in io.TextIOWrapper(p.stdout,
                                 encoding=locale.getpreferredencoding(False),
                                 errors='strict'): 
        print(line, end='')

You could experiment with encoding, errors parameters here e.g., set encoding='ascii' or use errors='namereplace' to replace unsupported characters (in the given character encoding) with \N{...} escape sequences (for debugging).

Why does opening a subprocess with universal_newlines cause a unicode decode exception?

Answers (2)

Related Questions