Reputation: 4802
I've got a rather large JSON file on my Windows machine and it contains stuff like \xE9
. When I JSON.parse
it, it works fine.
However, when I push the code to my server running CentOS, I always get this: "\xE9" on US-ASCII (Encoding::InvalidByteSequenceError)
Here is the output of file
on both machines
Windows:
λ file data.json
data.json: UTF-8 Unicode English text, with very long lines, with no line terminators
CentOS:
$ file data.json
data.json: UTF-8 Unicode English text, with very long lines, with no line terminators
Here is the error I get when trying to parse it:
$ ruby -rjson -e 'JSON.parse(File.read("data.json"))'
/usr/local/rvm/rubies/ruby-2.0.0-p353/lib/ruby/2.0.0/json/common.rb:155:in `encode': "\xC3" on US-ASCII (Encoding::InvalidByteSequenceError)
What could be causing this problem? I've tried using iconv to change the file into every possible encoding I can, but nothing seems to work.
Upvotes: 2
Views: 6181
Reputation: 434665
"\xE9"
is é
in ISO-8859-1 (and various other ISO-8859-X encodings and Windows-1250 and ...) and is certainly not UTF-8.
You can get File.read
to fix up the encoding for you by using the encoding options:
File.read('data.json',
:external_encoding => 'iso-8859-1',
:internal_encoding => 'utf-8'
)
That will give you a UTF-8 encoded string that you can hand to JSON.parse
.
Or you could let JSON.parse
deal with the encoding by using just :external_encoding
to make sure the string comes of the disk with the right encoding flag:
JSON.parse(
File.read('data.json',
:external_encoding => 'iso-8859-1',
)
)
You should have a close look at data.json
to figure out why file(1) thinks it is UTF-8. The file might incorrectly have a BOM when it is not UTF-8 or someone might be mixing UTF-8 and Latin-1 encoded strings in one file.
Upvotes: 8