Kush Patel
Kush Patel

Reputation: 3855

how to use tidyjson inside dplyr

I have dataframe, called data_df, which has one column which contain json string, column name is json_response.

I want access very specific key-value from it. Example of one of json string as follows. I want to know how many times success is true in string.

x = "[{\"s\":\"D\",\"success\":true,\"start.time\":\"2016-01-27 19:27:27\",\"stop.time\":\"2016-01-27 19:27:30\",\"status_code\":200,\"called\":true,\"milliseconds\":3738.6858,\"_row\":\"DataX\"},{\"s\":\"C\",\"success\":true,\"start.time\":\"2016-01-27 19:27:30\",\"stop.time\":\"2016-01-27 19:27:32\",\"status_code\":200,\"called\":true,\"milliseconds\":1815.1433,\"_row\":\"Clarity\"}]"

If I only want to use tidyjson, I can do it as follows, which works as I want.

library(dplyr)
library(tidyjson) 
x %>% gather_array %>% 
    spread_values(called = jstring("called")) %>% 
    summarize(x = sum(called == "TRUE"))

Now if I want to do it for whole column, how should I do it? I don't want to use a loop.

Following is my code which I tried to use.

data_df %>% 
    transmute(
              test =  json_response %>% gather_array %>% 
                  spread_values(called = jstring("called")) %>% 
                  summarize(x = sum(called=="TRUE")) 
    )

Following is the error I got when I ran the above code:

Error: not compatible with STRSXP

Upvotes: 0

Views: 409

Answers (2)

jeremy stanley
jeremy stanley

Reputation: 126

You can use tidyjson for this, simply convert data_df into a tbl_json object, and then proceed as before:

data_df %>% 
  as.tbl_json(json.column = "json_response") %>%
  # track each document if you don't already have an ID
  mutate(rownum = 1:n()) %>% 
  gather_array %>%           
  # use jlogical for correct type
  spread_values(success = jlogical("success")) %>%  
  group_by(rownum) %>% 
  summarize(num.successes = sum(success))

Upvotes: 0

akuiper
akuiper

Reputation: 214927

Instead of using tidyjson you can use rjson combined with dplyr in a way like this:

data_df$test <- data_df %>% rowwise %>% 
      do(test = .$json_response %>% as.character %>% fromJSON %>% sapply(`[[`, "called") %>% sum) %>% 
      as.data.frame

Upvotes: 2

Related Questions