stu
stu

Reputation: 183

Simple formatting with tidyjson with R

I have a simple JSON file which I'm attempting to coerce into an R data.frame.

json = "
{ \"objects\":
  {
    \"object_one\": {
      \"key1\"        : \"value1\",
      \"key2\"        : \"value2\",
      \"key3\"        : \"0\",
      \"key4\"        : \"value3\",
      \"key5\"        : \"False\",
      \"key6\"        : \"False\"
    },

    \"object_two\": {
      \"key1\"        : \"0.5\",
      \"key2\"        : \"0\",
      \"key3\"        : \"343\",
      \"key4\"        : \"value4\",
      \"key5\"        : \"True\",
      \"key6\"        : \"True\"
    }
  }
}
"

and I simply want to extract the name of each object as a index key (or rowname), create column names from the keys and spread the values.

Unfortunately I've had no luck unpicking the syntax. Can anyone help?

Thanks

Stuart

Upvotes: 1

Views: 1412

Answers (2)

jeremy stanley
jeremy stanley

Reputation: 126

There are two ways to do this with tidyjson, the first is to use tidyjson::append_values_string and then tidyr::spread:

library(tidyjson)
library(dplyr)
library(tidyr)
json %>% 
  enter_object("objects") %>% 
  gather_keys("object") %>% 
  gather_keys("key") %>% 
  append_values_string("value") %>% 
  tbl_df %>% spread(key, value)

#> # A tibble: 2 x 8
#>   document.id     object   key1   key2  key3   key4  key5  key6
#> *       <int>      <chr>  <chr>  <chr> <chr>  <chr> <chr> <chr>
#> 1           1 object_one value1 value2     0 value3 False False
#> 2           1 object_two    0.5      0   343 value4  True  True    

The other way is to use tidyjson::spread_values to specific each key separately:

json %>%
  enter_object("objects") %>%
  gather_keys("object") %>%
  spread_values(
    key1 = jstring("key1"),
    key2 = jstring("key2"),
    key3 = jnumber("key3"),
    key4 = jstring("key4"),
    key5 = jstring("key5"),
    key6 = jstring("key6")
  )

#>   document.id     object   key1   key2 key3   key4  key5  key6
#> 1           1 object_one value1 value2    0 value3 False False
#> 2           1 object_two    0.5      0  343 value4  True  True

The advantage of the second approach is that you can (a) specify the types of each column and (b) will be guaranteed to get the same data.frame structure even if the keys change (or are missing) in some documents or objects.

Upvotes: 2

SymbolixAU
SymbolixAU

Reputation: 26248

Not entirely sure on your desired output, but you can use jsonlite::fromJSON to extract the data, and data.table::rbindlist to put it into a data.table

library(jsonlite)
library(data.table)

rbindlist(fromJSON(json))

#     object_one object_two
# 1:     value1        0.5
# 2:     value2          0
# 3:          0        343
# 4:     value3     value4
# 5:      False       True
# 6:      False       True

Based on your comment, another approach that involves some reshaping

library(jsonlite)
library(reshape2)

lst <- fromJSON(json)
lst <- lapply(lst[[1]], unlist)

df <- as.data.frame(lst)
df$key <- rownames(df)

df <- melt(df, id = "key")
df <- dcast(df, formula = variable ~ key)
df
#      variable   key1   key2 key3   key4  key5  key6
# 1 object_one value1 value2    0 value3 False False
# 2 object_two    0.5      0  343 value4  True  True

Upvotes: 1

Related Questions