Encode Unicode to iso8859-15 with Python

Question

I am using Python 2.7.6 and I am trying to convert a basic unicode string to iso8859-15.

I get an error when trying to convert an string with non-ASCII chars. This would be ok, when those chars would not exist is the iso8859-15 coding but in this case they do:

Example:

>>> import codecs
>>> a = "test"
>>> a
'test'
>>> a.encode ('iso8859-15')
'test'
>>> a = "ü"
>>> a
'\xfc'
>>> a.encode ('iso8859-15')

Error Code:

Traceback (most recent call last):
  File "", line 1, in 
    a.encode ('iso8859-15')
  File "C:\Python27\lib\encodings\iso8859_15.py", line 12, in encode
    return codecs.charmap_encode(input,errors,encoding_table)
UnicodeDecodeError: 'ascii' codec can't decode byte 0xfc in position 0: ordinal not in range(128)

In this case "ü" is a valid iso8859-15 char with the hex-value fc or a 11111100 as binary. Looking in "C:\Python27\lib\encodings\iso8859_15.py" there is the value FC in line 300:

    47:  decoding_table = (
    48:  u'\x00'     #  0x00 -> NULL
    .....
    300: u'\xfc'     #  0xFC -> LATIN SMALL LETTER U WITH DIAERESIS

How can I convert unicode strings with non-ascii chars like "ü" into 'iso8859-15'? If the function encode does not work in this case: How can I import the encoding_table list in lib\encodings\iso8859_15.py directly into my code?

Martijn Pieters · Accepted Answer

You are trying to encode a byte string. The bytestring is already encoded, so Python tries to first decode it for you so that can then encode it again, and it'll use ASCII to do that.

The exception reflects this; you got an UnicodeDecodeError, not UnicodeEncodeError.

To create unicode values, use u'...' unicode literals instead:

>>> a = u'ü'
>>> a
u'\xfc'
>>> a.encode('iso8859-15')
'\xfc'

or decode your bytestring data to Unicode using a valid encoding:

>>> a = 'ü'
>>> a.decode('utf8')  # my terminal is configured to use UTF-8
u'\xfc'
>>> a.decode('utf8').encode('iso8859-15')
'\xfc'

Encode Unicode to iso8859-15 with Python

Answers (1)

Related Questions