Max
Max

Reputation: 15985

JSON produced by Ruby not compatible with JavaScript's JSON parser

I am running into an issue where the JSON produced by a Ruby script is not compatible when parsed by JavaScripts JSON.parse. Consider the following example:

# Ruby
require 'json'
hash = {}
hash["key"] = "value with \u001a unicode"
hash.to_json
=> '{"key":"value with \u001a unicode"}'

// JavaScript
JSON.parse('{"key":"value with \u001a unicode"}')
=> JSON.parse: bad control character in string literal at line 1 column 2 of the JSON data

The issue is the unicode character \u001a. The solution to this is to escape \u001a to \\u001a, but the thing is, the \u001a is automatically inserted into the string by Ruby. I can't reliably post-process the result. Any ideas about how to solve this?

Please note that I wish to call JSON.parse inside a JavaScript execution environment, not inside Ruby's interpreter.

Upvotes: 3

Views: 682

Answers (3)

Alex Pan
Alex Pan

Reputation: 4621

According to the RFC:

JSON text is encoded in unicode. The default unicode is utf-8.

I ran your code in irb and got the following:

1.9.3-p484 :001 > require 'json'
 => true
1.9.3-p484 :002 >
1.9.3-p484 :003 >   hash = {}
 => {}
1.9.3-p484 :004 > hash["key"] = "value with \u001a unicode"
 => "value with \u001A unicode"
1.9.3-p484 :005 > hash.to_json
 => "{\"key\":\"value with \\u001a unicode\"}"

Then running the returned string in a javascript console, I get the following:

> JSON.parse("{\"key\":\"value with \\u001a unicode\"}")
> Object {key: "value with  unicode"}

It is returning an object. To get the value with unicode, you have to access the hash by calling:

> str = JSON.parse("{\"key\":\"value with \\u001a unicode\"}")
> Object {key: "value with  unicode"}
> str.key
> "value with  unicode"

Upvotes: 0

Chris Heald
Chris Heald

Reputation: 62698

The short version is that you're interpreting your string as a Javascript expression before attempting to decode it as JSON.

U+001A is a control character. RFC 4627 explicitly disallows control characters U+0000-U+001F in quoted strings. Your problem here is not the the JSON is invalid, but that you are unescaping your control characters before attempting to parse them as JSON.

When you dump the string "\u001a" from Ruby and copy and paste it into a Javascript interpreter, the escape sequence translates to an unescaped control character, which is not a valid character in JSON! Non-prohibited characters work just fine - you can happily JSON.parse('["\u0020"]'), for example.

However, if you don't interpret the string as Javascript, and instead read it as raw bytes, it will parse correctly.

$ irb
irb(main):001:0> require 'json'
=> true
irb(main):003:0> open("out.json", "w") {|f| f.print JSON.dump(["\u001a"]) }
=> nil

$ node -e 'require("fs").readFile("out.json", function(err, data) { console.log(JSON.parse(data)); });'
[ '\u001a' ]

If you're going to be copy-pasting, you need to be copying an escaped version of the string, so that when the string is parsed by your Javascript engine, the escape double-escaped sequences properly unescape to escape sequences rather than characters. So, rather than copying the output of JSON.dump(["\u001a"]), you should be copying the output of puts JSON.dump(["\u001a"]).inspect, which will correctly escape any escape sequences in the string.

Upvotes: 4

jon snow
jon snow

Reputation: 3072

To me following ruby code gives "{\"key\":\"value with \\u001a unicode\"}" in output.

And JSON.parse also abel to pass it. and gives Object {key: "value with unicode"}.

Upvotes: 0

Related Questions