Pamela White
Pamela White

Reputation: 3

Handling Unicode in python 2.7 when saving string to a file

Dealing with Unicode is my only challenge programming with Python, I had many problems in my past project and I always brute forced my way out testing different encoding till something works (if there is any tutorial for beginners it will be very handy).

For example I have this code:

# -*- coding: utf-8 -*-
string = "Åland Islands"
with open("1.txt","w")as f:
    f.write(string.decode("utf-8"))

Returning:

   return codecs.utf_8_decode(input, errors, True)

UnicodeDecodeError: 'utf8' codec can't decode byte 0xc5 in position 0: invalid continuation byte

I tested many encoding to solve this with no luck.

Upvotes: 0

Views: 270

Answers (1)

chepner
chepner

Reputation: 531718

The coding line just tells the Python interpreter how it should interpret the bytes. That doesn't mean the script actually contains UTF-8-encoded text. In fact, the error message suggests that the file was saved as ISO-8859-encoded (Latin-1) text. 0xc5 is the Latin-1 encoding for Å; 0xc3 0x85 is the UTF-8 encoding.

You need to make sure your editor actually saves the file as UTF-8 encoded text, so that the coding line isn't lying to the interpreter.

Upvotes: 2

Related Questions