Leon
Leon

Reputation: 83

html entities decoding in php

I seem to be completely unable to get around utf-8 character encoding.

So I'm exporting content from a database as a utf-8 xml file. The software I am importing into is quite strict about character encoding, so I can't just put everything in CDATA tags.

There's a whole bunch of weird characters, e.g. ’, — … already in the data.

These aren't working in the xml and need to be replaced out (normally with just a ' quote).

Ideally, I'd like to decode all the characters, and then use htmlspecialchars($text, ENT_COMPAT, 'UTF-8', FALSE) to encode them back again. But I can't seem to find a function that will decode them. Is there one? I've started to manually go through each entity with a str_replace() but it's turning into a much bigger job than I anticipated.

Any help would be a lifesaver. Thanks

Upvotes: 0

Views: 2373

Answers (1)

mvds
mvds

Reputation: 47034

html_entity_decode() perhaps?

in some cases, in character conversion issues in php, it is important to have a locale set. Doesn't matter which, e.g.

setlocale(LC_CTYPE,'en_US.utf8');

But I would advise that any time invested in getting the encoding right from the beginning, without reverting to entities, if at all possible, is worth it.

Upvotes: 2

Related Questions