Reputation: 812
I'm having problem with special characters when casting a hash to a json string.
Everything works fine with Ruby 2.0 / Rails 3.2.21, that is,
puts "“".to_json
#"\u201c"
But with Ruby 2.3.0 / Rails 4.2.5.1 I get
puts "“".to_json
#"“"
Is there any way to force Ruby 2.3.0 to convert special characters to unicode style strings (\uXXXX
) ?
Remark:
Notice that in Ruby 2.3 / Rails 4, we get
"“".to_json.bytesize == 5 #true
However, in 2.0 we get
"“".to_json.bytesize == 8 #true
So clearly it's the string itself that is different, not different output formats.
Upvotes: 10
Views: 2465
Reputation: 121000
I ❤ Rails (just kidding.)
In Rails3 there was a hilarious method to damage UTF-8 in JSON. Rails4, thanks DHH, freed from this drawback.
So, whether one wants the time-back machine, the simplest way is to monkeypatch ::ActiveSupport::JSON::Encoding#escape
:
module ::ActiveSupport::JSON::Encoding
def self.escape(string)
if string.respond_to?(:force_encoding)
string = string.encode(::Encoding::UTF_8, :undef => :replace)
.force_encoding(::Encoding::BINARY)
end
json = string.
gsub(escape_regex) { |s| ESCAPED_CHARS[s] }.
gsub(/([\xC0-\xDF][\x80-\xBF]|
[\xE0-\xEF][\x80-\xBF]{2}|
[\xF0-\xF7][\x80-\xBF]{3})+/nx) { |s|
s.unpack("U*").pack("n*").unpack("H*")[0].gsub(/.{4}/n, '\\\\u\&')
}
json = %("#{json}")
json.force_encoding(::Encoding::UTF_8) if json.respond_to?(:force_encoding)
json
end
end
More robust solution would be to corrupt the result:
class String
def rails3_style
string = encode(::Encoding::UTF_8, :undef => :replace).
force_encoding(::Encoding::BINARY)
json = string.
gsub(/([\xC0-\xDF][\x80-\xBF]|
[\xE0-\xEF][\x80-\xBF]{2}|
[\xF0-\xF7][\x80-\xBF]{3})+/nx) { |s|
s.unpack("U*").pack("n*").unpack("H*")[0].gsub(/.{4}/n, '\\\\u\&')
}
json = %("#{json}")
json.force_encoding(::Encoding::UTF_8) if json.respond_to?(:force_encoding)
json
end
end
puts "“".to_json.rails3_style
#⇒ "\u201c"
I hardly could understand why anybody might want to do this on purpose, but the solution is here.
Upvotes: 6