Reputation:
I'm using the OptParse
module to retrieve a string value. OptParse
only supports str
typed strings, not unicode
ones.
So let's say I start my script with:
./someScript --some-option ééééé
French characters, such as 'é', being typed str
, trigger UnicodeDecodeError
s when read in the code:
UnicodeDecodeError: 'ascii' codec can't decode byte 0xc3 in position 99: ordinal not in range(128)
I played around a bit with the unicode built-in function, but either I get an error, or the character disappears:
>>> unicode('é');
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
UnicodeDecodeError: 'ascii' codec can't decode byte 0xc3 in position 0: ordinal not in range(128)
>>> unicode('é', errors='ignore');
u''
Is there anything I can do to use OptParse
to retrieve unicode
/utf-8
strings?
It seems that the string can be retrieved and printed OK, but then I try to use that string with SQLite (using the APSW module), and it tries to convert to unicode somehow with cursor.execute("...")
, and then the error occurs.
Here is a sample program that causes the error:
#!/usr/bin/python
# coding: utf-8
import os, sys, optparse
parser = optparse.OptionParser()
parser.add_option("--some-option")
(opts, args) = parser.parse_args()
print unicode(opts.some_option)
Upvotes: 6
Views: 2118
Reputation: 1
#!/usr/bin/python
# coding: utf-8
import os, sys, optparse
reload(sys)
sys.setdefaultencoding('utf-8')
parser = optparse.OptionParser()
parser.add_option(u"--some-option")
(opts, args) = parser.parse_args()
print opts.print_help()
Upvotes: 0
Reputation: 178115
Input is returned in the console encoding, so based on your updated example, use:
print opts.some_option.decode(sys.stdin.encoding)
unicode(opts.some_option)
defaults to using ascii
as the encoding.
Upvotes: 1
Reputation: 9484
You could decode the arguments before the parser handles them. Taking your example:
#!/usr/bin/python
# coding: utf-8
import os, sys, optparse
parser = optparse.OptionParser()
parser.add_option("--some-option")
# Decode the command line arguments to unicode
for i, a in enumerate(sys.argv):
sys.argv[i] = a.decode('ISO-8859-15')
(opts, args) = parser.parse_args()
print type(opts.some_option), opts.some_option
This gives the following output:
C:\workspace>python file.py --some-option préférer
<type 'unicode'> préférer
I've chose the ISO/IEC 8859-15 code page, as it seems most appropriate to you. Adapt if needed.
Upvotes: 4
Reputation: 24336
I believe your error is related to the following:
For example, to write Unicode literals including the Euro currency symbol, the ISO-8859-15 encoding can be used, with the Euro symbol having the ordinal value 164. This script will print the value 8364 (the Unicode codepoint corresponding to the Euro symbol) and then exit:
# -*- coding: iso-8859-15 -*-
currency = u"€"
print ord(currency)
Upvotes: 0