Reputation: 77
I am trying to extract JSON from a TSV column. The difficulty is the JSON is shallowly nested, and the key values may not be present in every row.
I have a minimal example to illustrate my point.
df <- tibble(index = c(1, 2),
data = c('{"json_char":"alpha", "json_list1":["x","y"]}',
'{"json_char":"beta", "json_list1":["x","y","z"], "json_list2":["a","b","c"]}'))
The desired result:
df <- tibble::tibble(index = list(1, 2),
json_char = list("alpha", "beta"),
json_list1 = list(list("x","y"), list("x","y","z")),
json_list2 = list(NA, list("a","b","c")))
After a fair amount of experimentation, I have this function:
extract_json_column <- function(df) {
df %>%
magrittr::use_series(data) %>%
purrr::map(jsonlite::fromJSON) %>%
purrr::map(purrr::simplify) %>%
tibble::enframe() %>%
tidyr::spread("name", "value") %>%
purrr::flatten_dfr()
}
Which gives me the following error: Error in bind_rows_(x, .id) : Argument 2 must be length 3, not 7
.
The first row sets the number of parameters for the rest of dataframe. Is there anyway to avoid that behavior?
Upvotes: 1
Views: 422
Reputation: 39154
I modified your function to the following. I hope this helps.
library(tidyverse)
library(rjson)
extract_json_column <- function(df){
df %>%
rowwise() %>%
mutate(data = map(data, fromJSON)) %>%
split(.$index) %>%
map(~.$data[[1]]) %>%
map(~map_if(., function(x) length(x) != 1, list)) %>%
map(as_data_frame) %>%
bind_rows(.id = "index")
}
extract_json_column(df)
# A tibble: 2 x 4
index json_char json_list1 json_list2
<chr> <chr> <list> <list>
1 1 alpha <chr [2]> <NULL>
2 2 beta <chr [3]> <chr [3]>
Upvotes: 1