Reputation: 83
I seem to be completely unable to get around utf-8 character encoding.
So I'm exporting content from a database as a utf-8 xml file. The software I am importing into is quite strict about character encoding, so I can't just put everything in CDATA tags.
There's a whole bunch of weird characters, e.g. ’, — … already in the data.
These aren't working in the xml and need to be replaced out (normally with just a ' quote).
Ideally, I'd like to decode all the characters, and then use htmlspecialchars($text, ENT_COMPAT, 'UTF-8', FALSE) to encode them back again. But I can't seem to find a function that will decode them. Is there one? I've started to manually go through each entity with a str_replace() but it's turning into a much bigger job than I anticipated.
Any help would be a lifesaver. Thanks
Upvotes: 0
Views: 2373
Reputation: 47034
html_entity_decode() perhaps?
in some cases, in character conversion issues in php, it is important to have a locale set. Doesn't matter which, e.g.
setlocale(LC_CTYPE,'en_US.utf8');
But I would advise that any time invested in getting the encoding right from the beginning, without reverting to entities, if at all possible, is worth it.
Upvotes: 2