vy32
vy32

Reputation: 29645

How do I tell Python that sys.argv is in Unicode?

Here is a little program:

import sys

f = sys.argv[1]
print type(f)
print u"f=%s" % (f)

Here is my running of the program:

$ python x.py 'Recent/רשימת משתתפים.LNK'
<type 'str'>
Traceback (most recent call last):
  File "x.py", line 5, in <module>
    print u"f=%s" % (f)
UnicodeDecodeError: 'ascii' codec can't decode byte 0xd7 in position 7: ordinal not in range(128)
$ 

The problem is that sys.argv[1] is thinking that it's getting an ascii string, which it can't convert to Unicode. But I'm using a Mac with a full Unicode-aware Terminal, so x.py is actually getting a Unicode string. How do I tell Python that sys.argv[] is Unicode and not Ascii? Failing that, how do I convert ASCII (that has unicode inside it) into Unicode? The obvious conversions don't work.

Upvotes: 16

Views: 12676

Answers (5)

jfs
jfs

Reputation: 414199

The UnicodeDecodeError error you see is due to you're mixing the Unicode string u"f=%s" and the sys.argv[1] bytestring:

  • both bytestrings:

      $ python2 -c'import sys; print "f=%s" % (sys.argv[1],)' 'Recent/רשימת משתתפים'
    

    This passes bytes transparently from/to your terminal. It works for any encoding.

  • both Unicode:

      $ python2 -c'import sys; print u"f=%s" % (sys.argv[1].decode("utf-8"),)' 'Rec..
    

    Here you should replace 'utf-8' by the encoding your terminal uses. You might use sys.getfilesystemencoding() here if the terminal is not Unicode-aware.

Both commands produce the same output:

f=Recent/רשימת משתתפים

In general you should convert bytestrings that you consider to be text to Unicode as soon as possible.

Upvotes: 21

sherpya
sherpya

Reputation: 4936

sys.argv = map(lambda arg: arg.decode(sys.stdout.encoding), sys.argv)

or you can pick encoding from locale.getdefaultlocale()[1]

Upvotes: 5

DmitrySandalov
DmitrySandalov

Reputation: 4109

try either:

f = sys.argv[1].decode('utf-8')

or:

f = unicode(sys.argv[1], 'utf-8')

Upvotes: 3

mkelley33
mkelley33

Reputation: 5601

  1. sys.argv is never "in Unicode"; it's encoded for sure, but Unicode is not an encoding, rather it is a set of code points (numbers), where each number uniquely represents a character. http://www.unicode.org/standard/WhatIsUnicode.html

  2. Go to Terminal.app > Terminal > Preferences > Settings > Character encoding, and select UTF-8 from the drop-down list.

  3. Also, the default Python that ships with Mac OS X has one flaw with regards to Unicode: its built using the deprecated UCS-2 by default; see: http://webamused.wordpress.com/2011/01/31/building-64-bit-python-python-org-using-ucs-4-on-mac-os-x-10-6-6-snow-leopard/

Upvotes: 2

user2665694
user2665694

Reputation:

Command line parameters are passed into Python as byte string using the encoding as used on the shell used for started Python. So there is no way for having commandline parameters passed into Python as unicode string other than converting parameters yourself to unicode inside your application.

Upvotes: 3

Related Questions