Reputation: 15588
I am using the subprocess module to run a child job, and collecting its output and error streams with subprocess.PIPE's. To avoid deadlock I continually read from those streams on a separate thread. This works, except sometimes the program crashes due to a decoding issue:
`UnicodeDecodeError: 'ascii' codec can't decode byte 0xe2 in position 483: ordinal not in range(128
At a high level, I understand that Python is probably trying to convert to a string using the ASCII codec, and that I need to call decode somewhere, I'm just not sure where. When I create my subprocess job, I specify universal_newlines to be True. I thought this meant, return stdout/stderr as unicode, not binary:
self.p = subprocess.Popen(self.command, shell=self.shell, stdout=subprocess.PIPE, stderr=subprocess.PIPE, universal_newlines=True)
The crash happens in my reading thread function:
def standardOutHandler(standardOut):
# Crash happens on the following line:
for line in iter(standardOut.readline, ''):
writerLock.acquire()
stdout_file.write(line)
if self.echoOutput:
sys.stdout.write(line)
sys.stdout.flush()
writerLock.release()
Its not clear why readline is throwing a decoding exception here; as I stated, I thought universal_newlines being true was already returning me decoded data.
What is going on here and what can I do to correct this?
Here is the full traceback
Exception in thread Thread-5:
Traceback (most recent call last):
File "/Library/Frameworks/Python.framework/Versions/3.4/lib/python3.4/threading.py", line 920, in _bootstrap_inner
self.run()
File "/Library/Frameworks/Python.framework/Versions/3.4/lib/python3.4/threading.py", line 868, in run
self._target(*self._args, **self._kwargs)
File "/Users/lzrd/my_process.py", line 61, in standardOutHandler
for line in iter(standardOut.readline, ''):
File "/Users/lzrd/Envs/my_env/bin/../lib/python3.4/encodings/ascii.py", line 26, in decode
return codecs.ascii_decode(input, self.errors)[0]
UnicodeDecodeError: 'ascii' codec can't decode byte 0xe2 in position 483: ordinal not in range(128)
Upvotes: 3
Views: 6522
Reputation: 2797
maby is nice:
process = subprocess.Popen(cmd, stdout=subprocess.PIPE, stderr=subprocess.PIPE, shell=True, encoding='utf-8')
out, err = process.communicate()
print('out: ')
print(out)
print('err: ')
print(err)
Upvotes: 0
Reputation: 414305
If you use universal_newlines=True
then the byte stream is decoded into Unicode using locale.getpreferredencoding(False)
character encoding that should be utf-8
on your system (check LANG
, LC_CTYPE
, LC_ALL
envvars).
If the exception persists; try your code with an empty loop body:
for line in standardOut: #NOTE: no need to use iter() idiom here on Python 3
pass
if you still get the exception then it might be a bug in Python if locale.getpreferredencoding(False)
is not ascii
if you check it near Popen()
call -- it is important to use exactly the same environment here.
I would understand if UnicodeDecodeError
were showing utf-8
instead of ascii
. In that case, you could try to decode the stream manually:
#!/usr/bin/env python3
import io
import locale
from subprocess import Popen, PIPE
with Popen(['command', 'arg 1'], stdout=PIPE, bufsize=1) as p:
for line in io.TextIOWrapper(p.stdout,
encoding=locale.getpreferredencoding(False),
errors='strict'):
print(line, end='')
You could experiment with encoding
, errors
parameters here e.g., set encoding='ascii'
or use errors='namereplace'
to replace unsupported characters (in the given character encoding) with \N{...}
escape sequences (for debugging).
Upvotes: 5