user1643156
user1643156

Reputation: 4537

Converting Unicode characters to the "\uxxxx" form

I'm trying to convert characters, like À, to their escaped form, such as \u00c0. I know this can be done with json_encode, but the function adds backslashes to special characters. (I'm not actually hoping to get a json object, just string conversion):

$str = 'À ß \ Ć " Ď < Ĕ';

For the string above, it'll return

$str = '\u00c0 \u00df \\ \u0106 \" \u010e < \u0114';

and if I stripslashes, it will also strip the one before each uxxxx.

Is there a function for this particular conversion? Or what is the simplest way to do it?

Upvotes: 4

Views: 2609

Answers (4)

John Slegers
John Slegers

Reputation: 47101

You can use the following code for going back and forward

Code :

if (!function_exists('codepoint_encode')) {
    function codepoint_encode($str) {
        return substr(json_encode($str), 1, -1);
    }
}

if (!function_exists('codepoint_decode')) {
    function codepoint_decode($str) {
        return json_decode(sprintf('"%s"', $str));
    }
}

How to use :

echo "\nUse JSON encoding / decoding\n";
var_dump(codepoint_encode("我好"));
var_dump(codepoint_decode('\u6211\u597d'));

Output :

Use JSON encoding / decoding
string(12) "\u6211\u597d"
string(6) "我好"

Upvotes: 3

Thiago Cordeiro
Thiago Cordeiro

Reputation: 619

function convertChars($str) {
    return json_decode("\"$str\"");
}

Upvotes: 0

David Farrell
David Farrell

Reputation: 3722

Slight modification to @cryptic's answer:

script

$str = 'À ß \ Ć " Ď < Ĕ \\\\uxxx';
echo trim(preg_replace('/\\\\([^u])/', "$1", json_encode($string, JSON_UNESCAPED_SLASHES)), '"');

output

\u00c0 \u00df \ \u0106 " \u010e < \u0114 \\uxxx

Upvotes: 0

kittycat
kittycat

Reputation: 15045

$str = 'À ß \ Ć " Ď < Ĕ';

echo trim(preg_replace('/\\\\([^u])/', "$1", json_encode($str)), '"');
// ouptuts: \u00c0 \u00df \ \u0106 " \u010e < \u0114

I know it uses json_encode(), but it's the easiest way to convert to \uXXXX

Upvotes: 1

Related Questions