Handling Unicode in python 2.7 when saving string to a file

Question

Dealing with Unicode is my only challenge programming with Python, I had many problems in my past project and I always brute forced my way out testing different encoding till something works (if there is any tutorial for beginners it will be very handy).

For example I have this code:

# -*- coding: utf-8 -*-
string = "Åland Islands"
with open("1.txt","w")as f:
    f.write(string.decode("utf-8"))

Returning:

   return codecs.utf_8_decode(input, errors, True)
UnicodeDecodeError: 'utf8' codec can't decode byte 0xc5 in position 0: invalid continuation byte

I tested many encoding to solve this with no luck.

chepner · Accepted Answer

The coding line just tells the Python interpreter how it should interpret the bytes. That doesn't mean the script actually contains UTF-8-encoded text. In fact, the error message suggests that the file was saved as ISO-8859-encoded (Latin-1) text. 0xc5 is the Latin-1 encoding for Å; 0xc3 0x85 is the UTF-8 encoding.

You need to make sure your editor actually saves the file as UTF-8 encoded text, so that the coding line isn't lying to the interpreter.

Handling Unicode in python 2.7 when saving string to a file

Answers (1)

Related Questions