Reputation: 3
Dealing with Unicode is my only challenge programming with Python, I had many problems in my past project and I always brute forced my way out testing different encoding till something works (if there is any tutorial for beginners it will be very handy).
For example I have this code:
# -*- coding: utf-8 -*-
string = "Åland Islands"
with open("1.txt","w")as f:
f.write(string.decode("utf-8"))
Returning:
return codecs.utf_8_decode(input, errors, True)
UnicodeDecodeError: 'utf8' codec can't decode byte 0xc5 in position 0: invalid continuation byte
I tested many encoding to solve this with no luck.
Upvotes: 0
Views: 270
Reputation: 531718
The coding line just tells the Python interpreter how it should interpret the bytes. That doesn't mean the script actually contains UTF-8-encoded text. In fact, the error message suggests that the file was saved as ISO-8859-encoded (Latin-1) text. 0xc5 is the Latin-1 encoding for Å; 0xc3 0x85 is the UTF-8 encoding.
You need to make sure your editor actually saves the file as UTF-8 encoded text, so that the coding line isn't lying to the interpreter.
Upvotes: 2