mike
mike

Reputation: 5213

JSON unicode characters conversion

I came across this strange JSON which I can't seem to decode. To simplify things, let's say it's a JSON string:

"\uffffffe2\uffffff94\uffffff94\uffffffe2\uffffff94\uffffff80\uffffffe2\uffffff94\uffffff80 mystring"

After decoding it should look as following:

└── mystring

JS or PHP doesn't seem to convert it correctly.

js> JSON.parse('"\uffffffe2\uffffff94\uffffff94\uffffffe2\uffffff94\uffffff80\uffffffe2\uffffff94\uffffff80 mystring"')
ffe2ff94ff94ffe2ff94ff80ffe2ff94ff80 mystring

PHP behaves the same

php> json_decode('"\uffffffe2\uffffff94\uffffff94\uffffffe2\uffffff94\uffffff80\uffffffe2\uffffff94\uffffff80 mystring"')
ffe2ff94ff94ffe2ff94ff80ffe2ff94ff80 mystring

Any ideas how to properly parse this JSON string would be welcome.

Upvotes: 1

Views: 2154

Answers (1)

Bogdan Savluk
Bogdan Savluk

Reputation: 6312

It is not valid JSON string - JSON supports only 4 hex digits after \u. Results from both PHP and JS are correct.

It is not possible decode this using standard functions.

Where did you get this JSON string?

About correct json for string you want to get - it should be "\u2514\u2500\u2500 mystring", or just "└── mystring" (json supports any unicode characters in strings except " and \).

Also if you need to encode some character that require more than two bytes - it will result in two escape codes for example "𩄎" would be "\ud864\udd0e" when escaped.

So, If you really need to decode string above - you can fix it before decoding, replacing \uffffffe2 by \uffff\uffe2 via regexp (for js it would be something like: s.replace(/(\\u[A-Fa-f0-9]{4})([A-Fa-f0-9]{4})/gi,'$1\\u$2') ).

But anyway character codes in string specified above does not look right.

Upvotes: 3

Related Questions