batgerel.e
batgerel.e

Reputation: 857

Convert unicode special characters to UTF-8

I have problem with converting unicode characters to utf-8. Here is my code:

<?php 
    $unicode = '\u0411. \u0426\u044d\u0446\u044d\u0433\u0441\u04af\u0440\u044d\u043d';

    $utf8string = html_entity_decode(preg_replace("/U\+([0-9A-F]{4})/", "&#x\\1;", $unicode), ENT_NOQUOTES, 'UTF-8');

    echo $utf8string;
?>

And it gives me below:

\u0411. \u0426\u044d\u0446\u044d\u0433\u0441\u04af\u0440\u044d\u043d

What did i do wrong ? any advice ?

Upvotes: 0

Views: 376

Answers (1)

Evert
Evert

Reputation: 99806

At the very least your regular expression is looking for an uppercase U, while all your escape sequences use lower-case.

But your conversion script goes from javascript-escaped unicode characters, to HTML entities, back to a PHP string. This might be a saner solution (for this string):

$unicode = '\u0411. \u0426\u044d\u0446\u044d\u0433\u0441\u04af\u0440\u044d\u043d';
echo json_decode('"' . $unicode . '"');

Be careful though, as this might break if the input string contains newlines or quotes.

Upvotes: 1

Related Questions