Dinei
Dinei

Reputation: 5444

Encoding issue on subprocess.Popen args

Yet another encoding question on Python.

How can I pass non-ASCII characters as parameters on a subprocess.Popen call?

My problem is not on the stdin/stdout as the majority of other questions on StackOverflow, but passing those characters in the args parameter of Popen.

Python script used for testing:

import subprocess

cmd = 'C:\Python27\python.exe C:\path_to\script.py -n "Testç on ã and ê"'

process = subprocess.Popen(cmd,stdout=subprocess.PIPE,stderr=subprocess.STDOUT)
output, err = process.communicate()
result = process.wait()

print result, '-', output

For this example call, the script.py receives Testç on ã and ê. If I copy-paste this same command string on a CMD shell, it works fine.

What I've tried, besides what's described above:

  1. Checked if all Python scripts are encoded in UTF-8. They are.
  2. Changed to unicode (cmd = u'...'), received an UnicodeEncodeError: 'ascii' codec can't encode character u'\xe7' in position 128: ordinal not in range(128) on line 5 (Popen call).
  3. Changed to cmd = u'...'.decode('utf-8'), received an UnicodeEncodeError: 'ascii' codec can't encode character u'\xe7' in position 128: ordinal not in range(128) on line 3 (decode call).
  4. Changed to cmd = u'...'.encode('utf8'), results in Testç on ã and ê
  5. Added PYTHONIOENCODING=utf-8 env. variable with no luck.

Looking on tries 2 and 3, it seems like Popen issues a decode call internally, but I don't have enough experience in Python to advance based on this suspicious.

Environment: Python 2.7.11 running on an Windows Server 2012 R2.

I've searched for similar problems but haven't found any solution. A similar question is asked in what is the encoding of the subprocess module output in Python 2.7?, but no viable solution is offered.

I read that Python 3 changed the way string and encoding works, but upgrading to Python 3 is not an option currently.

Thanks in advance.

Upvotes: 3

Views: 3086

Answers (1)

Mark Ransom
Mark Ransom

Reputation: 308196

As noted in the comments, subprocess.Popen in Python 2 is calling the Windows function CreateProcessA which accepts a byte string in the currently configured code page. Luckily Python has an encoding type mbcs which stands in for the current code page.

cmd = u'C:\Python27\python.exe C:\path_to\script.py -n "Testç on ã and ê"'.encode('mbcs')

Unfortunately you can still fail if the string contains characters that can't be encoded into the current code page.

Upvotes: 4

Related Questions