Reputation: 5444
Yet another encoding question on Python.
How can I pass non-ASCII characters as parameters on a subprocess.Popen
call?
My problem is not on the stdin/stdout as the majority of other questions on StackOverflow, but passing those characters in the args
parameter of Popen.
Python script used for testing:
import subprocess
cmd = 'C:\Python27\python.exe C:\path_to\script.py -n "Testç on ã and ê"'
process = subprocess.Popen(cmd,stdout=subprocess.PIPE,stderr=subprocess.STDOUT)
output, err = process.communicate()
result = process.wait()
print result, '-', output
For this example call, the script.py
receives Testç on ã and ê
. If I copy-paste this same command string on a CMD shell, it works fine.
What I've tried, besides what's described above:
cmd = u'...'
), received an UnicodeEncodeError: 'ascii' codec can't encode character u'\xe7' in position 128: ordinal not in range(128)
on line 5 (Popen
call).cmd = u'...'.decode('utf-8')
, received an UnicodeEncodeError: 'ascii' codec can't encode character u'\xe7' in position 128: ordinal not in range(128)
on line 3 (decode
call).cmd = u'...'.encode('utf8')
, results in Testç on ã and ê
PYTHONIOENCODING=utf-8
env. variable with no luck.Looking on tries 2 and 3, it seems like Popen
issues a decode
call internally, but I don't have enough experience in Python to advance based on this suspicious.
Environment: Python 2.7.11 running on an Windows Server 2012 R2.
I've searched for similar problems but haven't found any solution. A similar question is asked in what is the encoding of the subprocess module output in Python 2.7?, but no viable solution is offered.
I read that Python 3 changed the way string and encoding works, but upgrading to Python 3 is not an option currently.
Thanks in advance.
Upvotes: 3
Views: 3086
Reputation: 308196
As noted in the comments, subprocess.Popen
in Python 2 is calling the Windows function CreateProcessA
which accepts a byte string in the currently configured code page. Luckily Python has an encoding type mbcs
which stands in for the current code page.
cmd = u'C:\Python27\python.exe C:\path_to\script.py -n "Testç on ã and ê"'.encode('mbcs')
Unfortunately you can still fail if the string contains characters that can't be encoded into the current code page.
Upvotes: 4