Ryan
Ryan

Reputation: 10101

Convert PHP json_encode UTF8 hex entities in MySQL

In php, json_encode() will encode UTF8 in hex entities, e.g.

json_encode('中'); // become "\u4e2d"

Assume the data "\u4e2d" is now being stored in MySQL, is it possible to convert back from "\u4e2d" to without using PHP, just plain MySQL?

Upvotes: 1

Views: 325

Answers (2)

Álvaro González
Álvaro González

Reputation: 146390

Encoding non-ASCII characters as JavaScript entities is only one of the different things that JSON encoders will do—and it isn't actually mandatory:

echo json_encode('中'), PHP_EOL;
echo json_encode('中', JSON_UNESCAPED_UNICODE), PHP_EOL;
echo json_encode('One "Two" Three \中'), PHP_EOL;
"\u4e2d"
"中"
"One \"Two\" Three \\\u4e2d"

Thus the only safe decoding approach is using a dedicated JSON decoder. MySQL bundles the required abilities since 5.7.8:

SET @input = '"One \\"Two\\" Three \\\\\\u4e2d"';
SELECT @input AS json_string, JSON_UNQUOTE(@input) AS original_string;
json_string                     original_string
============================    ===================
"One \"Two\" Three \\\u4e2d"    One "Two" Three \中

(Demo)

If you have an older version you'll have to resort to more elaborate solutions (you can Google for third-party UDF's).

In any case, I suggest you get back to the design table. It's strange that you need JSON data in a context where you don't have a proper JSON decoder available.

Upvotes: 0

Peter Gulutzan
Peter Gulutzan

Reputation: 485

On my configuration, select hex('中'); returns E4B8AD which is the hex code of the UTF8 bytes. Naturally it is not the same as the hex of the code point 4e2d, but you can get that with select hex(cast('中' as char(1) character set utf16));.

Update: The questioner has edited the question, to what looks to me like a completely different question, now it's apparently: how to get '中' given a string containing '\u4e2d' when 4e2d is the code point of 中 and the default character set is utf8. Okay, that is select cast(char(conv(right('\u4e2d',4),16,10) using utf16) as char(1) character set utf8);

Upvotes: 1

Related Questions