cb0008
cb0008

Reputation: 143

How to convert ISO-8859-1 to UTF-8 using Python 3.7.4

How can I convert text in ISO-8859-1/latin1 to UTF-8 using Python 3.7.4 (32-bit)?

This is what I tried:

>>> inputText = "\xC4pple"
>>> inputText.decode('iso-8859-1').encode('utf8')

And it returned this error:

Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
AttributeError: 'str' object has no attribute 'decode'

What am I doing wrong?

Upvotes: 0

Views: 4931

Answers (1)

Woodford
Woodford

Reputation: 4449

decode is a member of the bytes type:

>>> help(bytes.decode)
Help on method_descriptor:

decode(self, /, encoding='utf-8', errors='strict')
    Decode the bytes using the codec registered for encoding.
    
    encoding
      The encoding with which to decode the bytes.
    errors
      The error handling scheme to use for the handling of decoding errors.
      The default is 'strict' meaning that decoding errors raise a
      UnicodeDecodeError. Other possible values are 'ignore' and 'replace'
      as well as any other name registered with codecs.register_error that
      can handle UnicodeDecodeErrors.

So inputText needs to be of type bytes, not str:

>>> inputText = b"\xC4pple"
>>> inputText.decode('iso-8859-1')
'Äpple'
>>> inputText.decode('iso-8859-1').encode('utf8')
b'\xc3\x84pple'

Note that the result of decode is type str and of encode is type bytes.

Upvotes: 1

Related Questions