Reputation: 187
Here is the raw json
data:
json_file <- '{"name":"Doe, John","group":"Red","age":{"v_0":24}}
{"name":"Doe, Jane","group":"Green","age":{"v_0":31}}
{"name":"Smith, Joan","group":"Yellow","age":{"v_0":22}}'
When I want to convert json_file
to a data frame:
library(RJSONIO)
json_file <- fromJSON(json_file)
I get this error:
Error: parse error: trailing garbage
:"Red","age":{"v_0":24}} {"name":"Doe, Jane","group":"Gr
(right here) ------^
I know if I change the raw data to the following data, everything would be fine:
json_file <- '[{"name":"Doe, John","group":"Red","age":{"v_0":24}},
{"name":"Doe, Jane","group":"Green","age":{"v_0":31}},
{"name":"Smith, Joan","group":"Yellow","age":{"v_0":22}}]'
But actually I would like to know:
1) How to get data frame from the raw data without splitting its objects using [
, ,
and ]
?
2) If there is no way, how to split objects in a large json
file by adding ,
to end of each line except the last line, and also adding [
and ]
to the first and last line of the file?
Upvotes: 6
Views: 6015
Reputation: 1166
There are ways to do it without the need to edit the file.
If you want a data.frame:
library(jsonlite)
# url
zips <- stream_in(url("http://media.mongodb.org/zips.json"))
# file
json_data <- stream_in(file("path/to/file.json"))
or if you want a list:
json_data_as_list <- readLines("path/to/file.json") %>% lapply(fromJSON)
Upvotes: 7
Reputation: 20302
You need those square brackets. Save the following as 'test.json':
{
"ID":["1","2","3","4","5","6","7","8" ],
"Name":["Rick","Dan","Michelle","Ryan","Gary","Nina","Simon","Guru" ],
"Salary":["623.3","515.2","611","729","843.25","578","632.8","722.5" ],
"StartDate":[ "1/1/2012","9/23/2013","11/15/2014","5/11/2014","3/27/2015","5/21/2013",
"7/30/2013","6/17/2014"],
"Dept":[ "IT","Operations","IT","HR","Finance","IT","Operations","Finance"]
}
Now, load the required library and point to that file you just saved:
# Load the package required to read JSON files.
library("rjson")
# Give the input file name to the function.
result <- fromJSON(file = "C:\\Users\\Excel\\Documents\\test.json")
# Print the result.
print(result)
Result:
print(result)
$ID
[1] "1" "2" "3" "4" "5" "6" "7" "8"
$Name
[1] "Rick" "Dan" "Michelle" "Ryan" "Gary" "Nina" "Simon" "Guru"
$Salary
[1] "623.3" "515.2" "611" "729" "843.25" "578" "632.8" "722.5"
$StartDate
[1] "1/1/2012" "9/23/2013" "11/15/2014" "5/11/2014" "3/27/2015" "5/21/2013" "7/30/2013" "6/17/2014"
$Dept
[1] "IT" "Operations" "IT" "HR" "Finance" "IT" "Operations" "Finance"
Upvotes: -2
Reputation: 5766
Your raw json data is already split into individual objects. On top of that, as a whole, the json data is invalid. Luckily, as you noticed, if you insert ,
at the end of each line (except the last) and wrap it all in square brackets, you get a collection of key-pairs (or arrays). So you should rather ask, "How do I combine all elements into a single data.frame?"
The solution: dplyr::bind_rows(fromJSON(json_file))
# A tibble: 3 x 3
name group age
<chr> <chr> <dbl>
1 Doe, John Red 24
2 Doe, Jane Green 31
3 Smith, Joan Yellow 22
Followup:
Assuming the json objects do not contain newlines, you can do an easy search-replace:
json_file <- gsub('\n', ',', trimws(json_file), fixed=TRUE)
I put in the trimws
to remove possible trailing newlines.
Next, you wrap it with square brackets:
json_file <- paste0('[', json_file, ']')
and you're back on track.
Upvotes: 2