Olga Makarova
Olga Makarova

Reputation: 121

R fromJSON incorrectly reads Unicode from file

I am trying to read json object in R from file, which contains names and surnames in unicode. Here is the content of the file "x1.json":

{"general": {"last_name":
"\u041f\u0430\u0449\u0435\u043d\u043a\u043e", "name":
"\u0412\u0456\u0442\u0430\u043b\u0456\u0439"}}

I use RJSONIO package and when I declare the JSON object directly, everything goes well:

x<-fromJSON('{"general": {"last_name": "\u041f\u0430\u0449\u0435\u043d\u043a\u043e", "name": "\u0412\u0456\u0442\u0430\u043b\u0456\u0439"}}')
x
# $general
# last_name      name 
# "Пащенко" "Віталій" 

But when I read the same from file, strings are converted to some unknown for me encoding:

x1<-fromJSON("x1.json")
x1
# $general
#    last_name         name 
# "\0370I5=:>" "\022VB0;V9" 

Note that these are not escaped "\u" (which was discussed here)

I have tried to specify "encoding" argument, but this did not help:

> x1<-fromJSON("x1.json", encoding = "UTF-8")
> x1
$general
   last_name         name 
"\0370I5=:>" "\022VB0;V9" 

System information:

> Sys.getlocale()
[1] "LC_COLLATE=Ukrainian_Ukraine.1251;LC_CTYPE=Ukrainian_Ukraine.1251;LC_MONETARY=Ukrainian_Ukraine.1251;LC_NUMERIC=C;LC_TIME=Ukrainian_Ukraine.1251"

Switching to English (Sys.setlocale("LC_ALL","English")) has not changed the situation.

Upvotes: 2

Views: 5500

Answers (2)

GreyDesolate
GreyDesolate

Reputation: 29

use library("jsonlite") not rjson

library("jsonlite")
mydf <- toJSON( mydf, encoding = "UTF-8")

will be fine

Upvotes: 0

MikA
MikA

Reputation: 5552

If your file had unicode data like this (instead of its representation)

{"general": {"last_name":"Пащенко", "name":"Віталій"}}

then,

> fromJSON("x1.json", encoding = "UTF-8")

will work

If you really want your code to work with current file, try like this

JSONstring=""
con  <- file("x1.json",open = "r")
while (length(oneLine <- readLines(con, n = 1, warn = FALSE)) > 0) {
JSONstring <- paste(JSONstring,parse(text = paste0("'",oneLine, "'"))[[1]],sep='')
}
fromJSON(JSONstring)

Upvotes: 1

Related Questions