Reputation: 766
I have list of JSON values (actually it's a text file where every line is one JSON object). Like this:
{ "id": 1, "name": "john", "age": 18, "education": "master" }
{ "id": 2, "name": "jack", "job": "clerk" }
...
Some of the values can be missing (e.g. first item doesn't have "job" value and second item doesn't have "education" and "age").
I need to create data frame in R and fill all missing column values as NAs (if field with unique name exists in at least one row). How to achieve this easier?
What I already done - I installed "rjson" package and parsed these lines to R lists. Let's assume that lines variable is a character vector of lines.
library(rjson)
lines <- // initialize "lines" var here
jsons <- sapply(lines, fromJSON)
"jsons" variable became "list of lists" (every JSON object is converted to list in R terminology). How to convert it to data.frame?
I want to see the following data frame for the example I provided:
"id" | "name" | "age" | "education" | "job"
-------------------------------------------
1 | "john" | 18 | "master" | NA
2 | "jack | NA | NA | "clerk"
Upvotes: 1
Views: 694
Reputation: 93803
Future me, correcting past me's mistakes. It would make more sense to use jsonlite
's stream_in
stream_in(txtfile)
# To test on `txt` from below, try:
# stream_in(textConnection(txt))
# Found 2 records...
# Imported 2 records. Simplifying...
# id name age education job
#1 NA john 18 master <NA>
#2 2 jack NA <NA> clerk
Use the jsonlite
package's fromJSON
function, after making a few inline edits to your original text data (I've also edited the first piece of id
data to include an explicit null
value, to show that it deals with this):
fromJSON(paste0("[", gsub("}\n", "},\n", txt), "]"))
# id name age education job
#1 NA john 18 master <NA>
#2 2 jack NA <NA> clerk
All I did was add a little formatting to wrap all the JSON lines together in [
and ]
and add a comma at the end of each closing }
- resulting in an output like the below which can be processed all at once by jsonlite::fromJSON
:
[{"1":"one"},{"2":"two"}]
Where txt
was your lines of data as presented, with a null
in the id
variable:
txt <- "{ \"id\": null, \"name\": \"john\", \"age\": 18, \"education\": \"master\" }
{ \"id\": 2, \"name\": \"jack\", \"job\": \"clerk\" }"
Upvotes: 3
Reputation: 32416
From plyr
you can use rbind.fill
to add the NAs for you
library(plyr)
rbind.fill(sapply(jsons, data.frame), jsons)
# id name age education job
# 1 1 john 18 master <NA>
# 2 2 jack NA <NA> clerk
or from data.table
library(data.table)
rbindlist(jsons, fill=T)
and dplyr
library(dplyr)
bind_rows(sapply(jsons, data.frame))
Upvotes: 3