bklimt
bklimt

Reputation: 1842

Ruby JSON.parse returning incorrect data for unicode

I'm trying to parse some JSON containing escaped unicode characters using JSON.parse. But on one machine, using json/ext, it gives back incorrect values. For example, \u2030 should return E2 80 B0 in UTF-8, but instead I'm getting 01 00 00. It fails with either the escaped "\\u2030" or the unescaped "\u2030".

1.9.2p180 :001 > require 'json/ext'
 => true 
1.9.2p180 :002 > s = JSON.parse '{"f":"\\u2030"}'
 => {"f"=>"\u0001\u0000\u0000"} 
1.9.2p180 :003 > s["f"].encoding
 => #<Encoding:UTF-8> 
1.9.2p180 :004 > s["f"].valid_encoding?
 => true 
1.9.2p180 :005 > s["f"].bytes.map do |x| x; end
 => [1, 0, 0] 

It works on my other machine with the same version of ruby and similar environment variables. The Gemfile.lock on both machines is identical, including json (= 1.6.3). It does work with json/pure on both machines.

1.9.2p180 :001 > require 'json/pure'
 => true 
1.9.2p180 :002 > s = JSON.parse '{"f":"\\u2030"}'
 => {"f"=>"‰"} 
1.9.2p180 :003 > s["f"].encoding
 => #<Encoding:UTF-8> 
1.9.2p180 :004 > s["f"].valid_encoding?
 => true
1.9.2p180 :005 > s["f"].bytes.map do |x| x; end
 => [226, 128, 176] 

So is there something else in my environment or setup that could be causing it to parse incorrectly?

Upvotes: 5

Views: 2426

Answers (2)

Michael Pilat
Michael Pilat

Reputation: 6520

Recently ran into this same problem, and I tracked it down to this Ruby bug caused by the declaration of this buffer in Ruby 1.9.2 and how it gets optimized by GCC. It's fixed in this commit.

You can recompile Ruby with -O0 or use a newer version of Ruby (1.9.3 or better) to fix it.

Upvotes: 5

Tom Meinlschmidt
Tom Meinlschmidt

Reputation: 207

Try upgrade your JSON Gem (at least to 1.6.6) or newest 1.7.1.

Upvotes: 1

Related Questions