pd441
pd441

Reputation: 2763

R: jsonlite package - fromJSON converts json files to character rather than intended list

Given this example data frame I can easily convert the nested json files into a flattened list, which I can then in subsequent steps convert into a dataframe with one column per json entry:

sample.df.a <- data.frame(json_col = c('[{"foo_a":"_","foo_c":2}]',
                                 '[{"foo_a":"_","foo_b":"_","foo_c":2,"nested_col":{"foo_d":"_","foo_e":3}}]'))
sample.df.a.list <- apply(sample.df.a, 1, jsonlite::fromJSON, flatten = T)

However my actual data that I need to work with has the following format:

sample.df.b <- as.data.frame(apply(sample.df.a, 1, toJSON))

(this is how the data has come to me and can't be changed, and isn't a result of a toJSON conversion as in this engineered example. With my actual data when I try to collapse the nested json into lists (the desired output, as is given with sample.df.a.list) it instead returns a character which I cannot then subsequently convert into a dataframe, like so:

sample.df.b.list <- apply(sample.df.b, 1, jsonlite::fromJSON, flatten = T)

Does anyone know how can I create the same sort of collapsed list as sample.df.a.list from sample.df.b?

Thanks in advance!

FYI: subsequent code to vonvert the lists into a dataframe:

library(dpylr)
list.a.as.df <- bind_rows(lapply(sample.df.a.list,data.frame))

Upvotes: 2

Views: 631

Answers (2)

SymbolixAU
SymbolixAU

Reputation: 26258

Your sample.df.b contains your required JSON, but wrapped inside a JSON array [" and "]. One method is use 'regex'/gsub to remove the outer braces & quotes (and the extra \ characters) to give you the JSON you need. Then you just call your usual code

sample.df.b <- data.frame(json_col = apply(sample.df.a, 1, toJSON))

sample.df.b$json_col <- gsub('^\\[\\"|\\"\\]$|\\\\', "", sample.df.b$json_col)

apply(sample.df.b, 1, jsonlite::fromJSON, flatten = T)

# [[1]]
# foo_a foo_c
# 1     _     2
# 
# [[2]]
# foo_a foo_b foo_c nested_col.foo_d nested_col.foo_e
# 1     _     _     2                _                3 

Upvotes: 1

G. Grothendieck
G. Grothendieck

Reputation: 269852

Apply fromJSON twice:

lapply(lapply(as.character(sample.df.b[[1]]), fromJSON), fromJSON, flatten = TRUE)

giving:

[[1]]
  foo_a foo_c
1     _     2

[[2]]
  foo_a foo_b foo_c nested_col.foo_d nested_col.foo_e
1     _     _     2                _                3

Upvotes: 2

Related Questions