Reputation: 4254
I'm sending a JSON POST body to my PHP web service that looks something like this:
{
"foo": "☺"
}
When I echo out the body in the PHP, I see this:
{
"foo":"\xe2\x98\xba"
}
I've also tried sending the \uXXXX
equivalent:
{
"foo": "\u263a"
}
This got further, in that the raw JSON string received had "foo":"\\u263a"
, but after json_decode
the value turned to \xe2\x98\xba
.
This is causing problems when I come to use the value in a JSON response. I get:
json_encode(): Invalid UTF-8 sequence in argument
At its simplest, this is what happens why I try to JSON encode the string:
> php -r 'echo json_encode("\x98\xba\xe2");'
PHP Warning: json_encode(): Invalid UTF-8 sequence in argument in Command line code on line 1
My question is: how can I best get this smiley face from one end of my application to the other?
I'd appreciate any help you could offer.
Upvotes: 5
Views: 5378
Reputation: 32072
PHP's json_decode()
function behaves correctly given your input case, returning the sequence of UTF-8 bytes (E2 98 BA
) that represent the character.
However, Apache HTTPD applies the \x
escaping (in function ap_escape_logitem()
) before writing the line to the error log (as you did for testing purposes using error_log()
). As noted in file server/gen_test_char.c
, "all [...] 8-bit chars with the high bit set" are escaped.
Upvotes: 3
Reputation: 13608
I believe this is the correct behavior of json_encode. If you use the following:
<script>
alert(
<?php
$a = "☺";
echo json_encode($a);
?>
);
</script>
The HTML output will be alert("\u263a");
and the alert will show ☺
since "\u263a"
is a correct representation of the string in JavaScript.
Usage of JSON_UNESCAPED_UNICODE
constant as the second parameter of json_encode
in PHP is also an option, but available only for PHP 5.4.0 or newer.
In what scenario do you intend to use the value?
Edit:
php -r 'echo json_encode("\x98\xba\xe2");'
PHP Warning: json_encode(): Invalid UTF-8 sequence in argument in Command line code on line 1
The problem is you use a wrong sequence of characters. It should be
echo json_encode("\xe2\x98\xba"); // this works for me
instead of
echo json_encode("\x98\xba\xe2");
Upvotes: 2
Reputation: 14233
I think when you encode you have to use
json_encode({
foo": "☺"}, JSON_UNESCAPED_UNICODE)
Basically json_encode function works only for UTF-8 encoding so before you encode check the encoding of string,like this .
mb_check_encoding("your string", 'UTF-8') ;
if it returns false then you can convert to utf-8 using
utf8_encode("your string");
Upvotes: 1