Reputation: 41
I am most likely missing some really easy, but I can't wrap my head around why what seems to work for everyone else doesnt work for me.
Goal: I want to run shell commands with native output in non-english characters, capture the output in a variable then print to screen.
Problem: All my output that should have the non-english characters are replaced with ? marks.
Thoughts: is there an encoding issue? I am running python 3.8, shouldnt be!! Also running Windows 10, but also happens in Windows 7 and Server 2008.
>>> p=subprocess.run("dir",shell=True,encoding="utf8")
Volume in drive C has no label.
Volume Serial Number is A22B-FA10
Directory of C:\Users\jeronimo\Documents\Github
04/24/2021 08:17 AM <DIR> .
04/24/2021 08:17 AM <DIR> ..
07/21/2020 09:37 PM <DIR> scripts
04/24/2021 08:09 AM <DIR> **Администратор**
1 File(s) 295 bytes
11 Dir(s) 151,978,950,656 bytes free
>>> p=subprocess.run("dir",capture_output=True,shell=True,encoding="utf8")
>>> p.stdout
' Volume in drive C has no label.\n Volume Serial Number is A22B-FA10\n\n Directory of C:\\Users\\jeronimo\\Documents\\Github\n\n04/24/2021 08:17 AM <DIR> .\n04/24/2021 08:17 AM <DIR>
..\n05/18/2020 01:24 PM scripts\n04/24/2021 08:09 AM <DIR> **?????????????**\n 1 File(s) 295 bytes\n 11 Dir(s) 151,976,796,160 bytes free\n'
>>> print(p.stdout)
Volume in drive C has no label.
Volume Serial Number is A22B-FA10
Directory of C:\Users\jeronimo\Documents\Github
04/24/2021 08:17 AM <DIR> .
04/24/2021 08:17 AM <DIR> ..
07/21/2020 09:37 PM <DIR> scripts
04/24/2021 08:09 AM <DIR> **?????????????**
1 File(s) 295 bytes
11 Dir(s) 151,976,796,160 bytes free
EDIT: I've tried piping out to a file:
>>> f=open('file','a+',encoding='utf-8')
>>> p=subprocess.call("dir",shell=True,encoding="utf8",stdout=f)
>>> f.close()
Volume in drive C has no label.
Volume Serial Number is A22B-FA10
Directory of C:\Users\jeronimo\Documents\Github
04/24/2021 11:49 AM <DIR> .
04/24/2021 11:49 AM <DIR> ..
07/21/2020 09:37 PM <DIR> scripts
04/24/2021 08:09 AM <DIR> ?????????????
1 File(s) 0 bytes
11 Dir(s) 151,974,350,848 bytes free
I've tried many variations of subprocess - popen, run, check_output, call - all give the same result. What the heck am i doing wrong?
Upvotes: 1
Views: 2689
Reputation: 1797
A quick 'module' for this task. Should work on any Windows...
I noticed that every windows encoding starts with cp
and ends with a bunch of numbers. We can get the current encoding writing chcp
in cmd.exe
. Go ahead, try this out.
The output should look like this in Russian:
Текущая кодовая страница: 866
or this (utf-8):
Active code page: 65001
It will be in the stdout
of the subprocess.run
call. We do not care for the letters (and they will be unreadable, since we do not know the encoding), but we will get the numbers with _REGEX
and store them in a module-wide cache variable _CP_CODE
.
After this we know the encoding and we'll be using run
function without any problems. It will always return valid strings inside stdout
and stderr
.
import subprocess
import re
_CP_CODE = None
_REGEX = re.compile(br".+: (\d+)\s*$")
def get_cp_code():
stdout = subprocess.run(
"chcp", shell=True,
stdout=subprocess.PIPE, stderr=subprocess.DEVNULL,
stdin=subprocess.DEVNULL
).stdout
result = re.search(_REGEX, stdout)
if result is None:
raise ValueError(stdout)
else:
return int(result.group(1))
def run(cmd, **kwargs):
global _CP_CODE
if _CP_CODE is None:
_CP_CODE = get_cp_code()
return subprocess.run(
cmd,
shell=True,
encoding=f'cp{_CP_CODE}',
**kwargs
)
if __name__ == "__main__":
command = f"TASKKILL /F /PID 12345 /T"
res = run(command)
print(res.stderr)
Upvotes: 1
Reputation: 41
Solved if I change the terminal coding before running subprocess AND specified utf-8 encoding in the subprocess call
os.system('chcp 65001')
output = subprocess.run(data, timeout=10, encoding="utf8", shell=True, stdin=subprocess.DEVNULL,stderr=subprocess.PIPE,stdout=subprocess.PIPE)
Upvotes: 3