Reputation: 1223
string.decode() throws an error, when i try to decode the line output of an stdout.PIPE. The error message is:
UnicodeDecodeError: 'utf-8' codec can't decode byte 0x84 in position 8: invalid start byte
0x84 should be the letter 'ä'. The line that fails reads as follows:
b' Datentr\x84ger in Laufwerk C: ist System'
I can't nail it down. I already checked the encoding using sys.stdout.encoding
, which is utf-8
.
import subprocess
import re
prc = subprocess.Popen(["cmd.exe"], shell = False, stdout=subprocess.PIPE, stdin=subprocess.PIPE)
prc.stdin.write(b"dir\n")
outp, inp = prc.communicate()
regex = re.compile(r"^.*(\d\d:\d\d).*$")
for line in outp.splitlines():
match = regex.match(line.decode('utf-8'))# <--- decode fails here.
if match:
print(match.groups())
prc.stdin.close()
Upvotes: 0
Views: 1447
Reputation: 3354
If you don’t know the encoding, the cleanest way to solve this is to specify the errors param of bytearray.decode
, e.g.:
import subprocess
p = subprocess.run(['echo', b'Evil byte: \xe2'], stdout=subprocess.PIPE)
p.stdout.decode(errors='backslashreplace')
Output:
'Evil byte: \\xe2\n'
The list of possible values can be found here: https://docs.python.org/3/library/codecs.html#codecs.register_error
Upvotes: 0
Reputation: 1223
CMD encodes text using ISO-8859-15
. So the text that comes through the PIPE needs to be decoded using ISO, even if python encodes the stdout using utf-8.
Upvotes: 2