Harper
Harper

Reputation: 1223

Decoding error while decoding stdout from subprocess.Popen

string.decode() throws an error, when i try to decode the line output of an stdout.PIPE. The error message is:

UnicodeDecodeError: 'utf-8' codec can't decode byte 0x84 in position 8: invalid start byte

0x84 should be the letter 'ä'. The line that fails reads as follows:

b' Datentr\x84ger in Laufwerk C: ist System'

I can't nail it down. I already checked the encoding using sys.stdout.encoding, which is utf-8.

import subprocess
import re

prc = subprocess.Popen(["cmd.exe"], shell = False, stdout=subprocess.PIPE, stdin=subprocess.PIPE)
prc.stdin.write(b"dir\n")
outp, inp = prc.communicate()

regex = re.compile(r"^.*(\d\d:\d\d).*$")

for line in outp.splitlines():
    match = regex.match(line.decode('utf-8'))#  <--- decode fails here.
    if match:
        print(match.groups())

prc.stdin.close()

Upvotes: 0

Views: 1447

Answers (2)

x squared
x squared

Reputation: 3354

If you don’t know the encoding, the cleanest way to solve this is to specify the errors param of bytearray.decode, e.g.:

import subprocess
p = subprocess.run(['echo', b'Evil byte: \xe2'], stdout=subprocess.PIPE)
p.stdout.decode(errors='backslashreplace')

Output:

'Evil byte: \\xe2\n'

The list of possible values can be found here: https://docs.python.org/3/library/codecs.html#codecs.register_error

Upvotes: 0

Harper
Harper

Reputation: 1223

CMD encodes text using ISO-8859-15. So the text that comes through the PIPE needs to be decoded using ISO, even if python encodes the stdout using utf-8.

Upvotes: 2

Related Questions