Reputation: 29645
Here is a little program:
import sys
f = sys.argv[1]
print type(f)
print u"f=%s" % (f)
Here is my running of the program:
$ python x.py 'Recent/רשימת משתתפים.LNK'
<type 'str'>
Traceback (most recent call last):
File "x.py", line 5, in <module>
print u"f=%s" % (f)
UnicodeDecodeError: 'ascii' codec can't decode byte 0xd7 in position 7: ordinal not in range(128)
$
The problem is that sys.argv[1] is thinking that it's getting an ascii string, which it can't convert to Unicode. But I'm using a Mac with a full Unicode-aware Terminal, so x.py
is actually getting a Unicode string. How do I tell Python that sys.argv[] is Unicode and not Ascii? Failing that, how do I convert ASCII (that has unicode inside it) into Unicode? The obvious conversions don't work.
Upvotes: 16
Views: 12676
Reputation: 414199
The UnicodeDecodeError
error you see is due to you're mixing the Unicode string u"f=%s"
and the sys.argv[1]
bytestring:
both bytestrings:
$ python2 -c'import sys; print "f=%s" % (sys.argv[1],)' 'Recent/רשימת משתתפים'
This passes bytes transparently from/to your terminal. It works for any encoding.
both Unicode:
$ python2 -c'import sys; print u"f=%s" % (sys.argv[1].decode("utf-8"),)' 'Rec..
Here you should replace 'utf-8'
by the encoding your terminal uses. You might use sys.getfilesystemencoding()
here if the terminal is not Unicode-aware.
Both commands produce the same output:
f=Recent/רשימת משתתפים
In general you should convert bytestrings that you consider to be text to Unicode as soon as possible.
Upvotes: 21
Reputation: 4936
sys.argv = map(lambda arg: arg.decode(sys.stdout.encoding), sys.argv)
or you can pick encoding from locale.getdefaultlocale()[1]
Upvotes: 5
Reputation: 4109
try either:
f = sys.argv[1].decode('utf-8')
or:
f = unicode(sys.argv[1], 'utf-8')
Upvotes: 3
Reputation: 5601
sys.argv is never "in Unicode"; it's encoded for sure, but Unicode is not an encoding, rather it is a set of code points (numbers), where each number uniquely represents a character. http://www.unicode.org/standard/WhatIsUnicode.html
Go to Terminal.app > Terminal > Preferences > Settings > Character encoding, and select UTF-8 from the drop-down list.
Also, the default Python that ships with Mac OS X has one flaw with regards to Unicode: its built using the deprecated UCS-2 by default; see: http://webamused.wordpress.com/2011/01/31/building-64-bit-python-python-org-using-ucs-4-on-mac-os-x-10-6-6-snow-leopard/
Upvotes: 2
Reputation:
Command line parameters are passed into Python as byte string using the encoding as used on the shell used for started Python. So there is no way for having commandline parameters passed into Python as unicode string other than converting parameters yourself to unicode inside your application.
Upvotes: 3