Reputation: 233
I am using Python 2.7.6 and I am trying to convert a basic unicode string to iso8859-15.
I get an error when trying to convert an string with non-ASCII chars. This would be ok, when those chars would not exist is the iso8859-15 coding but in this case they do:
Example:
>>> import codecs
>>> a = "test"
>>> a
'test'
>>> a.encode ('iso8859-15')
'test'
>>> a = "ü"
>>> a
'\xfc'
>>> a.encode ('iso8859-15')
Error Code:
Traceback (most recent call last):
File "<pyshell#20>", line 1, in <module>
a.encode ('iso8859-15')
File "C:\Python27\lib\encodings\iso8859_15.py", line 12, in encode
return codecs.charmap_encode(input,errors,encoding_table)
UnicodeDecodeError: 'ascii' codec can't decode byte 0xfc in position 0: ordinal not in range(128)
In this case "ü"
is a valid iso8859-15 char with the hex-value fc or a 11111100 as binary.
Looking in "C:\Python27\lib\encodings\iso8859_15.py" there is the value FC in line 300:
47: decoding_table = (
48: u'\x00' # 0x00 -> NULL
.....
300: u'\xfc' # 0xFC -> LATIN SMALL LETTER U WITH DIAERESIS
How can I convert unicode strings with non-ascii chars like "ü" into 'iso8859-15'? If the function encode does not work in this case: How can I import the encoding_table list in lib\encodings\iso8859_15.py directly into my code?
Upvotes: 3
Views: 8246
Reputation: 1124518
You are trying to encode a byte string. The bytestring is already encoded, so Python tries to first decode it for you so that can then encode it again, and it'll use ASCII to do that.
The exception reflects this; you got an UnicodeDecodeError
, not UnicodeEncodeError
.
To create unicode values, use u'...'
unicode literals instead:
>>> a = u'ü'
>>> a
u'\xfc'
>>> a.encode('iso8859-15')
'\xfc'
or decode your bytestring data to Unicode using a valid encoding:
>>> a = 'ü'
>>> a.decode('utf8') # my terminal is configured to use UTF-8
u'\xfc'
>>> a.decode('utf8').encode('iso8859-15')
'\xfc'
Upvotes: 8