ruinernix
ruinernix

Reputation: 690

JSON.parse unexpected character with special characters in string?

I am having some trouble using JSON.parse on certain characters. I'm receiving this data via an API, I don't have the means to force any form of encoding on the server side, this data is provided to me as-is.

This is the json in question:

{"name": "»»»»»»»"}

I created a jsfiddle with the json data and the basic JSON.parse function which returns "Unexpected token in JSON at position 11". (there are some special characters in there that you probably won't see in your browser, jsfiddle will show them)

https://jsfiddle.net/4u1LtvLm/2/

How would I go about fixing this string prior to doing JSON.parse on it, without losing the special characters?

EDIT: modified jsfiddle and json to only contain the string causing trouble, so it's less confusing for everyone.

Upvotes: 1

Views: 27233

Answers (3)

ruinernix
ruinernix

Reputation: 690

My solution was this: https://stackoverflow.com/a/40558081/370709

function escapeUnicode(str) {
    return str.replace(/[^\0-~]/g, function(ch) {
        return "\\u" + ("0000" + ch.charCodeAt().toString(16)).slice(-4);
    });
}

Problem solved!

Upvotes: 3

pid
pid

Reputation: 11607

The problem at position 423 is this character:

»

This is not a standard ASCII character. JSON has some restrictions (UTF-8) on its content, you should be able to have a character as this in a valid JSON string. But as it seems you must escape it properly.

I would convert the string by replacing those non-ASCII characters (UTF-8 surrogates) to their escaped version (such as \x0382 and similar). Only then churn it through the JSON parser and finally expect the data to contain those escape characters.

Based on how you consume them, they may already be well-formed or require to be back-converted into UTF-8 surrogates.

EDIT: valid JSON text should in fact be UTF-8, but that's the standard. It is possible that a lousy non-standard implementation of a parser does not honor this restriction and require ASCII, instead. Which obviously means that there's a lake of tears ahead in using it.

EDIT 2: Oh, wait. This is on node.js? Well, that's not a lousy implementation at all, in fact it's one of the best (fastest and robust) I've ever come across... Consider converting to ASCII only as a last resort. If possible, identify the true culprit and solve the problem without conversion. As long as it is a UTF-8 string it should work right out of the box. If it's a UNICODE string, convert it to UTF-8 (not ASCII... forget about ASCII... node.js should work perfectly with UTF-8).

BTW, by posting the string on the web you intrinsically loose the encoding and force it to UTF-8, which may be the reason why we cannot reproduce your problem.

EDIT 3: If in doubt, use this encoder.

Upvotes: 0

Nils Schlüter
Nils Schlüter

Reputation: 1436

JSON.parse needs to get a string which consists only of unicode characters (see Json parsing with unicode characters).

For you the JSON.parse method fails, because your string contains non-unicode characters. If you paste your string into http://jsonparseronline.com/ you will see that it fails because of the character, which is the character the browser displays if the string is not correctly encoded.

So, if you don't have a way to change the endcoding of your string, you won't be able to do this. You can try something like this to change the encoding, but ti give a definite answer you would need to know how your string is encoded in the first place

Upvotes: 0

Related Questions