Ross McFarlane
Ross McFarlane

Reputation: 4254

PHP Unicode in JSON

I'm sending a JSON POST body to my PHP web service that looks something like this:

{
    "foo": "☺"
}

When I echo out the body in the PHP, I see this:

{
    "foo":"\xe2\x98\xba"
}

I've also tried sending the \uXXXX equivalent:

{
    "foo": "\u263a"
}

This got further, in that the raw JSON string received had "foo":"\\u263a", but after json_decode the value turned to \xe2\x98\xba.

This is causing problems when I come to use the value in a JSON response. I get:

json_encode(): Invalid UTF-8 sequence in argument

At its simplest, this is what happens why I try to JSON encode the string:

> php -r 'echo json_encode("\x98\xba\xe2");'
PHP Warning:  json_encode(): Invalid UTF-8 sequence in argument in Command line code on line 1

My question is: how can I best get this smiley face from one end of my application to the other?

I'd appreciate any help you could offer.

Upvotes: 5

Views: 5378

Answers (3)

PleaseStand
PleaseStand

Reputation: 32072

PHP's json_decode() function behaves correctly given your input case, returning the sequence of UTF-8 bytes (E2 98 BA) that represent the character.

However, Apache HTTPD applies the \x escaping (in function ap_escape_logitem()) before writing the line to the error log (as you did for testing purposes using error_log()). As noted in file server/gen_test_char.c, "all [...] 8-bit chars with the high bit set" are escaped.

Upvotes: 3

Mifeet
Mifeet

Reputation: 13608

I believe this is the correct behavior of json_encode. If you use the following:

<script>
    alert(
     <?php
       $a = "☺";
       echo json_encode($a);
     ?>
    );
</script>

The HTML output will be alert("\u263a"); and the alert will show since "\u263a" is a correct representation of the string in JavaScript.

Usage of JSON_UNESCAPED_UNICODE constant as the second parameter of json_encode in PHP is also an option, but available only for PHP 5.4.0 or newer.

In what scenario do you intend to use the value?


Edit:

php -r 'echo json_encode("\x98\xba\xe2");'

PHP Warning: json_encode(): Invalid UTF-8 sequence in argument in Command line code on line 1

The problem is you use a wrong sequence of characters. It should be

echo json_encode("\xe2\x98\xba"); // this works for me

instead of

echo json_encode("\x98\xba\xe2"); 

Upvotes: 2

Arun Killu
Arun Killu

Reputation: 14233

I think when you encode you have to use json_encode({ foo": "☺"}, JSON_UNESCAPED_UNICODE)

Basically json_encode function works only for UTF-8 encoding so before you encode check the encoding of string,like this .

 mb_check_encoding("your string", 'UTF-8') ;

if it returns false then you can convert to utf-8 using

utf8_encode("your string");

Upvotes: 1

Related Questions