Reputation: 857
I have problem with converting unicode characters to utf-8. Here is my code:
<?php
$unicode = '\u0411. \u0426\u044d\u0446\u044d\u0433\u0441\u04af\u0440\u044d\u043d';
$utf8string = html_entity_decode(preg_replace("/U\+([0-9A-F]{4})/", "&#x\\1;", $unicode), ENT_NOQUOTES, 'UTF-8');
echo $utf8string;
?>
And it gives me below:
\u0411. \u0426\u044d\u0446\u044d\u0433\u0441\u04af\u0440\u044d\u043d
What did i do wrong ? any advice ?
Upvotes: 0
Views: 376
Reputation: 99806
At the very least your regular expression is looking for an uppercase U
, while all your escape sequences use lower-case.
But your conversion script goes from javascript-escaped unicode characters, to HTML entities, back to a PHP string. This might be a saner solution (for this string):
$unicode = '\u0411. \u0426\u044d\u0446\u044d\u0433\u0441\u04af\u0440\u044d\u043d';
echo json_decode('"' . $unicode . '"');
Be careful though, as this might break if the input string contains newlines or quotes.
Upvotes: 1