Reputation: 2162
Take this example snippet.
import subprocess
import os
env = os.environ.copy()
env["FOO"] = u"foo"
subprocess.check_call(["ls", "-l"], env=env)
On Windows, this fails.
C:\Python27\python.exe test.py
Traceback (most recent call last):
File "test.py", line 7, in <module>
subprocess.check_call(["ls", "-l"], env=env)
File "C:\Python27\lib\subprocess.py", line 535, in check_call
retcode = call(*popenargs, **kwargs)
File "C:\Python27\lib\subprocess.py", line 522, in call
return Popen(*popenargs, **kwargs).wait()
File "C:\Python27\lib\subprocess.py", line 710, in __init__
errread, errwrite)
File "C:\Python27\lib\subprocess.py", line 958, in _execute_child
startupinfo)
TypeError: environment can only contain strings
sys.path
is documented to be perfectly ok with unicode. What is the correct way to deal with this (and similar code) so that everything works as expected? The obvious solution is to call .encode()
on the unicode path but I'm not sure if that will lead to unexpected behaviours.
Upvotes: 3
Views: 874
Reputation: 15388
On Windows, passing an environment dict to subprocess.check_call()
boils down to passing the environment to CreateProcess()
. That one can actually take unicode strings (in its CreateProcessW()
incarnation).
However, from python 2.7's _subprocess.c:
/* TODO: handle unicode command lines? */
/* TODO: handle unicode environment? */
So you are not the first to think of the problem.
There is also no general solution to your problem, because the environment is interpreted by the called process and some of them also automatically by the system or system libraries. So the correct encoding depends on what the target process expects.
Unfortunately, while Python 2 on Windows does handle Unicode, it actually passes on zero-terminated narrow character strings (i.e. PyString_AS_STRING()
returns char *
), to the system functions.
Now, how does Windows itself handle the two different versions of environment variables, since obviously it seems to be possible to pass either wide or narrow environment strings.
The target process has access only to GetEnvironmentStrings()
which returns either wide or narrow characters depending if the application was compiled with Unicode support or without.
So what happens, when you do CreateProcess()
from a narrow (ANSI) process to launch an Unicode process? The same thing that happens to all arguments, they get decoded in the caller's codepage and transformed to Windows' version of UCS-2 wide characters.
So the correct way is probably to use the system codepage because only then will the strings actually appear correctly in an unicode target process. This --of course-- prevents you from using characters not in that codepage ...
So yes, Unicode environments on Python 2 are more or less broken.
Upvotes: 5