Reputation: 25
I am having problems converting unicode to html entities.
Here is my current code:
>> name = u'\xc3\xa1\xc3\xa1\xc3\xa1\xc3\xa1'
>> entities = name.encode('ascii', 'xmlcharrefreplace')
>> print str(entities)
áááá
Each \xc3\xa1
= á
(multibyte character), but when I convert it to entities, I get 2 entities for a single character.
Upvotes: 1
Views: 3480
Reputation: 32560
\xc3\xa1
is á
in UTF-8, not in Unicode.
(áááá
in Unicode would be u'\xe1\xe1\xe1\xe1'
)
You therefore need to use a string literal to define it, not an unicode literal (''
vs u''
). Once you got UTF-8, you need to decode that to Unicode, in other to encode it again to ASCII with XML entities:
>>> name = '\xc3\xa1\xc3\xa1\xc3\xa1\xc3\xa1'.decode('utf-8')
>>> name.encode('ascii', 'xmlcharrefreplace')
'áááá'
Upvotes: 8