Reputation: 282915
I'm json_encoding
some strings. Sometimes they contain binary data. This causes the encoding to fail with error code JSON_ERROR_UTF8
. Running the strings through utf8_encode
gets around this error. However, ✓
(a unicode checkmark) gets encoded as \u00e2\u009c\u0093
which when interpreted by JavaScript and rendered in your browser actually looks like â
.
How can I fix this? Is there another encoding I can use?
echo json_encode(utf8_encode('✓')); // "\u00e2\u009c\u0093"
Now press F12 and paste that into your JavaScript console (quotes included). It should output â
.
Please note that
echo json_encode('✓'); // "\u2713"
Works as intended. The issue is that sometimes the string will contain binary data which json_encode
can't handle, so I need to sanitize every string without breaking the strings it can handle.
More examples:
json_encode(chr(200)); // false (bad)
json_encode(utf8_encode(chr(200))) // "\u00c8" (good)
json_encode('✓'); // "\u2713" (good)
json_encode(utf8_encode(chr(200))) // "\u00e2\u009c\u0093" (bad)
So you see, encoding it works well for some strings and breaks others.
This is strictly for logging. I don't care if the binary data comes out weird, I just don't want it to mess with valid strings.
Upvotes: 0
Views: 618
Reputation: 282915
Running strings through this function
function _utf8($str) {
if(!mb_check_encoding($str, 'UTF-8')) {
return utf8_encode($str);
}
return $str;
}
(taken and modified from here)
Seems to give the results I'm after.
Checkmarks are left alone, but chr(200)
and other weirdness is encoded:
json_encode(utf8_encode(chr(200))) // "\u00c8"
Upvotes: 1
Reputation: 30595
EDIT: This question is unanswerable. Encoding arbitrary binary data is one thing, keeping UTF-8 characters intact is something completely separate. What's to stop the byte sequence 0xe29c93
from being interpreted as ✓
when it shows up in your binary data?
According to the json_encode
PHP reference page, you can use the following syntax to encode Unicode characters:
json_encode($data, JSON_UNESCAPED_UNICODE);
It should make it pass unicode characters through unescaped.
Upvotes: 0