Rod
Rod

Reputation: 55882

Running windows batch file encoded in utf-8 using subprocess in python

When trying to run windows batch files, encoded using utf-8, using Python 2.7 under Windows 7, the first command of the batch file is not recognized (see example).

Most likely, the bom is interpreted as characters. How can I make the underlying shell run the batch files properly?

The batch file called is from a third party. Here is a simple python script that recreates the problem:

import codecs
import subprocess

content = "@echo off"
with codecs.open('test_utf8.bat', 'w', 'utf-8-sig') as f:
    f.write(content)
    f.close()

with open('test_ansi.bat', 'w') as f:
    f.write(content)
    f.close()

print "Calling test_ansi.bat"
subprocess.call('test_ansi.bat', shell=True)

print "Calling test_utf8.bat"
subprocess.call('test_utf8.bat', shell=True)

print "Done"

Running the script gives the following output

t:\tmp\test>python test.py
Calling test_ansi.bat
Calling test_utf8.bat

t:\tmp\test>´╗┐@echo off
'´╗┐@echo' is not recognized as an internal or external command,
operable program or batch file.
Done

t:\tmp\test>

As a note, the shell parameter doesn't seem to have any effect.

Upvotes: 2

Views: 6553

Answers (1)

jsbueno
jsbueno

Reputation: 110756

Ok . I will let you with your reasons to use python to create batch files, and run those files externally, instead of doing it in Python - and also with your reasons to want those batches in utf-8 instead of whatever is the native encoding of your Windows or your DOS (as it is not uncommon that both differ).

And here it is: Just encode to "utf-8", and not to "utf-8-sig". The later is not an oficial variant, rather a variant that prepends marker bytes (BOM) which makes the file open ok in Windows notepad: """ increase the reliability with which a UTF-8 encoding can be detected, Microsoft invented a variant of UTF-8 (that Python 2.5 calls "utf-8-sig") for its Notepad program: Before any of the Unicode characters is written to the file, a UTF-8 encoded BOM (which looks like this as a byte sequence: 0xef, 0xbb, 0xbf) is written. As it’s rather improbable that any charmap encoded file starts with these byte values """ (http://docs.python.org/2/library/codecs.html), but is otherwise garbage for various other apps (including,as you see, Microsoft's cmd ).

In short: encode to "utf-8". If you want to edit the files in windows, after they are generated use an editor, not notepad, which remains mostly unchanged since windows 3.0 days. (I wonder if it can open files larger than 64kB nowadays)

Upvotes: 3

Related Questions