Reputation: 24748
I am using this package here: HTML.py 0.04
Here is what I am doing:
import html
h = html.HTML()
h.p('Some simple Euro: €1.14')
h.p(u'Some Euro: €1.14')
Now when I do >>> unicode(h)
I get an error.
UnicodeDecodeError: 'ascii' codec can't decode byte 0xe2 in position 18: ordinal not in range(128)
What is the best way to handle this? I need to write the html to a file.
Upvotes: 0
Views: 434
Reputation: 536329
h.p('Some simple Euro: €1.14')
You should avoid byte strings (''
in Python 2, b''
in Python 3) for HTML content. The character model of HTML is Unicode, so only Unicode strings (u''
) should be used.
You can get away with doing it wrong for simple ASCII characters. Because most common byte encodings are supersets of ASCII, Python 2 will implicitly convert ASCII byte strings to Unicode. But the €
character isn't part of ASCII, so Python can't tell how to read it. If you have saved the source code above using the UTF-8 encoding then you have the byte string b'\xe2\x82\xac'
, which could mean €
, €
, 竄ャ
, or many other character sequences depending on what encoding is used.
Upvotes: 1