Reputation: 3855
I have dataframe, called data_df
, which has one column which contain json string, column name is json_response
.
I want access very specific key-value from it. Example of one of json string as follows. I want to know how many times success is true in string.
x = "[{\"s\":\"D\",\"success\":true,\"start.time\":\"2016-01-27 19:27:27\",\"stop.time\":\"2016-01-27 19:27:30\",\"status_code\":200,\"called\":true,\"milliseconds\":3738.6858,\"_row\":\"DataX\"},{\"s\":\"C\",\"success\":true,\"start.time\":\"2016-01-27 19:27:30\",\"stop.time\":\"2016-01-27 19:27:32\",\"status_code\":200,\"called\":true,\"milliseconds\":1815.1433,\"_row\":\"Clarity\"}]"
If I only want to use tidyjson
, I can do it as follows, which works as I want.
library(dplyr)
library(tidyjson)
x %>% gather_array %>%
spread_values(called = jstring("called")) %>%
summarize(x = sum(called == "TRUE"))
Now if I want to do it for whole column, how should I do it? I don't want to use a loop.
Following is my code which I tried to use.
data_df %>%
transmute(
test = json_response %>% gather_array %>%
spread_values(called = jstring("called")) %>%
summarize(x = sum(called=="TRUE"))
)
Following is the error I got when I ran the above code:
Error: not compatible with STRSXP
Upvotes: 0
Views: 409
Reputation: 126
You can use tidyjson
for this, simply convert data_df
into a tbl_json
object, and then proceed as before:
data_df %>%
as.tbl_json(json.column = "json_response") %>%
# track each document if you don't already have an ID
mutate(rownum = 1:n()) %>%
gather_array %>%
# use jlogical for correct type
spread_values(success = jlogical("success")) %>%
group_by(rownum) %>%
summarize(num.successes = sum(success))
Upvotes: 0
Reputation: 214927
Instead of using tidyjson
you can use rjson
combined with dplyr
in a way like this:
data_df$test <- data_df %>% rowwise %>%
do(test = .$json_response %>% as.character %>% fromJSON %>% sapply(`[[`, "called") %>% sum) %>%
as.data.frame
Upvotes: 2