davidalk
davidalk

Reputation: 105

Python - Get command output cannot be decoded

I'm currently working on a project where I need to run a command in powershell, and part of the output is not in English (Specifically - Hebrew).

For example (a simplified version of the problem), if I want to get the content of my desktop, and there is a filename in Hebrew:

import subprocess
command = "powershell.exe ls ~/Desktop"
print (subprocess.run(command.split(), stdout=subprocess.PIPE).stdout.decode())

This code will raise the following error (Or something similar with a different byte value):

UnicodeDecodeError: 'utf8' codec can't decode byte 0x96 in position 19: invalid start byte

Tried to run it on a different computer, and this was the output:

?????

Any idea why is that and how can I fix it? Tried a lot of things I saw on other questions, but none of them worked for me.

Upvotes: 1

Views: 1311

Answers (1)

mklement0
mklement0

Reputation: 437998

Note: The following are Python 3+ solutions, but there is a caveat:

  • With the first solution below and also with the second one - but only if UTF-8 data must be sent to PowerShell's stdin stream - due to a bug in powershell.exe, the Windows PowerShell CLI, the current console window switches to a raster font (potentially with a different font size), which does not support most non-extended-ASCII-range Unicode characters. While visually jarring, this is merely a display (rendering) problem; the data is handled correctly; switching back to a Unicode-aware font such as Consolas reveals the correct output.

  • By contrast, pwsh.exe, the PowerShell (Core) (v6+) CLI does not exhibit this problem.


Option A: Configure both the console and Python to use UTF-8 character encoding before executing your script:

  • Configure the console to use UTF-8:

    • From cmd.exe, by switching the active OEM code page to 65001 (UTF-8); note that this change potentially affects all later calls to console applications in the session, independently of Python, unless you restore the original code page (see Option B below):

      chcp 65001
      
    • From PowerShell:

      $OutputEncoding = [Console]::InputEncoding = [Console]::OutputEncoding = [System.Text.UTF8Encoding]::new()
      
  • And configure Python (v3+) to use UTF-8 consistently:[1]

    • Set environment variable PYTHONUTF8 to 1, possibly persistently, via the registry; to do it ad hoc:

      • From cmd.exe:

        Set PYTHONUTF8=1
        
      • From PowerShell:

        $env:PYTHONUTF8=1
        
    • Alternatively, for an individual call (v3.7+): Pass command-line option -X utf8 to the python interpreter (note: case matters):

        python -X utf8 somefile.py ...
      
    • Both options enable Python UTF-8 Mode, which will become the default in Python 3.15.

Now, your original code should work as-is (except for the display bug).

Note:

  • A simpler alternative via a one-time configuration step is to configure your system to use UTF-8 system-wide, in which case both the OEM and the ANSI code pages are set to 65001. However, this has far-reaching consequences - see this answer.

Option B: (Temporarily) switch to UTF-8 for the PowerShell call:

import sys, ctypes, subprocess

# Switch Python's own encoding to UTF-8, if necessary
# This is the in-script equivalent of setting environment var. 
# PYTHONUTF8 to 1 *before* calling the script.
sys.stdin.reconfigure(encoding='utf-8'); sys.stdout.reconfigure(encoding='utf-8'); sys.stderr.reconfigure(encoding='utf-8')

# Save the current console output code page and switch to 65001 (UTF-8)
previousCp = windll.kernel32.GetConsoleOutputCP()
windll.kernel32.SetConsoleOutputCP(65001)

# PowerShell now emits UTF-8-encoded output; decode it as such.
command = "powershell.exe ls ~/Desktop"
print(subprocess.run(command, stdout=subprocess.PIPE).stdout.decode())

# Restore the previous output console code page.
windll.kernel32.SetConsoleOutputCP(previousCp)

Note:

  • Due to setting only the output console page, the Windows PowerShell display bug is avoided.
  • If you also wanted to send input to PowerShell's stdin stream, you'd have to set the input console page too, via windll.kernel32.SetConsoleCP(65001) (which would then again surface the display bug).

[1] This isn't strictly necessary just for correctly decoding PowerShell's output, but matters if you want to pass that output on from Python: Python 3.x defaults to the active ANSI(!) code page for encoding non-console output, which means that Hebrew characters, for instance, cannot be represented in non-console output (e.g., when redirecting to a file), and cause the script to break.

Upvotes: 3

Related Questions