Reputation: 110382
I have run into a character encoding problem as follows:
rating = 'Barntillåten'
new_file = codecs.open(os.path.join(folder, "metadata.xml"), 'w', 'utf-8')
new_file.write(
"""<?xml version="1.0" encoding="UTF-8"?>
<ratings>
<rating system="%s">%s</rating>
</ratings>""" % (values['rating_system'], rating))
The error I get is:
File "./assetshare.py", line 314, in write_file
</ratings>""" % (values['rating_system'], rating))
I know that the encoding error is related to Barntillåten
, because if I replace that word with test
, the function works fine.
Why is this encoding error happening and what do I need to do to fix it?
Upvotes: 0
Views: 1847
Reputation: 157414
In Python 2, codecs.open
expects to read and write unicode
objects. You're passing it a str
.
The fix is to ensure that the data you pass it is unicode
:
new_file.write((
"""<?xml version="1.0" encoding="UTF-8"?>
"""<ratings>
<rating system="%s">%s</rating>
</ratings>""" % (values['rating_system'], rating)
).decode('utf-8'))
If you use unicode
literals (u"..."
) then Python will try to ensure that all data is unicode
. Here it would be sufficient to have rating = u'Barntillåten'
:
rating = u'Barntillåten'
new_file = codecs.open(os.path.join(folder, "metadata.xml"), 'w', 'utf-8')
new_file.write(
"""<?xml version="1.0" encoding="UTF-8"?>
"""<ratings>
<rating system="%s">%s</rating>
</ratings>""" % (values['rating_system'], rating))
You can write into a codecs.open
file a str
object, but only if the str
is encoded in the default encoding, which means that for safety that's only safe if the str
is plain ASCII. The default encoding is and should be left as ASCII; see Changing default encoding of Python?
Upvotes: 2
Reputation: 799120
You need to use unicode
literals.
u'...'
u"..."
u'''......'''
u"""......"""
Upvotes: 1
Reputation: 204926
rating
must be a Unicode string in order to contain Unicode codepoints.
rating = u'Barntillåten'
Otherwise, in Python 2, the non-Unicode string 'Barntillåten'
contains bytes (encoded with whatever your source encoding was), not codepoints.
Upvotes: 3