Reputation: 349
I got a system which previously the html encoding type was set as ISO-8859-1 and it caused all the Chinese characters store in the format of "&\#36830;&\#34915;&\#35033;"
.
So my question is, how can I convert the format above into Chinese word back in UTF-8
?
For your information, I had tried with utf8_decode, iconv, but none of them work. :(
Thank you very much.
Upvotes: 2
Views: 4926
Reputation: 522042
The current text encoding of that string is rather insubstantial. What you have there are HTML entities; they have little to do with the underlying "physical" encoding like ISO-8859 or UTF-8. What you want is to decode those HTML entities into a byte representation of the characters in a specific encoding, in this case to UTF-8. Therefore:
echo html_entity_decode('连衣裙', ENT_COMPAT, 'UTF-8');
// 连衣裙
Upvotes: 1
Reputation: 201568
There are many tools that can convert character references to characters, and writing such a tool is rather straightforward, especially if you know the references are all decimal. So the answer really depends on the software environment.
For example, to do such a conversion for an individual HTML document, you could use the BabelPad editor: command Convert → Numeric Character References (NCR) → NCR to Unicode, and save the result as UTF-8.
Upvotes: 0
Reputation: 3059
You need to use:
utf8_encode($data);
and not decode,to convert your current ISO-8859-1 to UTF-8.
Some native PHP functions such as strtolower(), strtoupper() and ucfirst() do not always function correctly with UTF-8 strings. Possible solutions: convert to latin first or add the following line to your code:
setlocale(LC_CTYPE, 'C');
Make sure not to save your PHP files using a BOM (Byte-Order Marker) UTF-8 file marker (your browser might show these BOM characters between PHP pages on your site).
Just for your reference:
ISO-8859-1 => Albanian, Brazilian, Catalan, Danish, Dutch, English, Finnish, French, German, Portuguese, Norwegian, Spanish, Swedish
UTF-8 => Chinese (simplified), Chinese (traditional), Japanese, Persian
Upvotes: 1