Reputation: 105
I'm currently working on a project where I need to run a command in powershell, and part of the output is not in English (Specifically - Hebrew).
For example (a simplified version of the problem), if I want to get the content of my desktop, and there is a filename in Hebrew:
import subprocess
command = "powershell.exe ls ~/Desktop"
print (subprocess.run(command.split(), stdout=subprocess.PIPE).stdout.decode())
This code will raise the following error (Or something similar with a different byte value):
UnicodeDecodeError: 'utf8' codec can't decode byte 0x96 in position 19: invalid start byte
Tried to run it on a different computer, and this was the output:
?????
Any idea why is that and how can I fix it? Tried a lot of things I saw on other questions, but none of them worked for me.
Upvotes: 1
Views: 1311
Reputation: 437998
Note: The following are Python 3+ solutions, but there is a caveat:
With the first solution below and also with the second one - but only if UTF-8 data must be sent to PowerShell's stdin stream - due to a bug in powershell.exe
, the Windows PowerShell CLI, the current console window switches to a raster font (potentially with a different font size), which does not support most non-extended-ASCII-range Unicode characters. While visually jarring, this is merely a display (rendering) problem; the data is handled correctly; switching back to a Unicode-aware font such as Consolas
reveals the correct output.
By contrast, pwsh.exe
, the PowerShell (Core) (v6+) CLI does not exhibit this problem.
Option A: Configure both the console and Python to use UTF-8 character encoding before executing your script:
Configure the console to use UTF-8:
From cmd.exe
, by switching the active OEM code page to 65001
(UTF-8); note that this change potentially affects all later calls to console applications in the session, independently of Python, unless you restore the original code page (see Option B below):
chcp 65001
From PowerShell:
$OutputEncoding = [Console]::InputEncoding = [Console]::OutputEncoding = [System.Text.UTF8Encoding]::new()
And configure Python (v3+) to use UTF-8 consistently:[1]
Set environment variable PYTHONUTF8
to 1
, possibly persistently, via the registry; to do it ad hoc:
From cmd.exe
:
Set PYTHONUTF8=1
From PowerShell:
$env:PYTHONUTF8=1
Alternatively, for an individual call (v3.7+): Pass command-line option -X utf8
to the python
interpreter (note: case matters):
python -X utf8 somefile.py ...
Both options enable Python UTF-8 Mode, which will become the default in Python 3.15.
Now, your original code should work as-is (except for the display bug).
Note:
65001
. However, this has far-reaching consequences - see this answer.Option B: (Temporarily) switch to UTF-8 for the PowerShell call:
import sys, ctypes, subprocess
# Switch Python's own encoding to UTF-8, if necessary
# This is the in-script equivalent of setting environment var.
# PYTHONUTF8 to 1 *before* calling the script.
sys.stdin.reconfigure(encoding='utf-8'); sys.stdout.reconfigure(encoding='utf-8'); sys.stderr.reconfigure(encoding='utf-8')
# Save the current console output code page and switch to 65001 (UTF-8)
previousCp = windll.kernel32.GetConsoleOutputCP()
windll.kernel32.SetConsoleOutputCP(65001)
# PowerShell now emits UTF-8-encoded output; decode it as such.
command = "powershell.exe ls ~/Desktop"
print(subprocess.run(command, stdout=subprocess.PIPE).stdout.decode())
# Restore the previous output console code page.
windll.kernel32.SetConsoleOutputCP(previousCp)
Note:
windll.kernel32.SetConsoleCP(65001)
(which would then again surface the display bug).[1] This isn't strictly necessary just for correctly decoding PowerShell's output, but matters if you want to pass that output on from Python: Python 3.x defaults to the active ANSI(!) code page for encoding non-console output, which means that Hebrew characters, for instance, cannot be represented in non-console output (e.g., when redirecting to a file), and cause the script to break.
Upvotes: 3