Tyler Rinker
Tyler Rinker

Reputation: 109984

Fast convert of many rows to json character

I have ~15 data.frames with 100K-300K rows. I want to condense on the variable v the other columns into a character json format for condensed storage reasons. Note that each group in v will have multiple rows (1 or more; likely more). I have code below that inefficiently uses the jsonlite package to convert but because of how I set up the splits it's slow and not memory efficient. How could I do this faster and more memory efficient. I don't need to use the jsonlite package just did because it's the only way I knew how to do this. I'm thinking there's a way to make the character json directly using data.table in a fast way but can't think of how to do this.

PS if it helps to know the motivation...I'm doing this to have a hash table that I can look up v in and then convert the json back to an R data.frame on the fly. Maybe there's a way to use jsonlite more directly than I am but toJSON(dat) is not what I'm after.

MWE

set.seed(10)

dat <- data.frame(
    v = rep(c('red', 'blue'), each =3),
    w = sample(LETTERS, 6),
    x = sample(1:3, 6, T),
    y = sample(1:3, 6, T),
    z = sample(1:3, 6, T),
    stringsAsFactors = FALSE
)

dat

Data View

     v w x y z
1  red N 1 1 2
2  red H 1 2 3
3  red K 2 2 3
4 blue P 2 2 2
5 blue B 2 1 3
6 blue E 2 1 2

Coverting

library(jsonlite)
jsonlist <- lapply(split(dat[-1], dat$v), function(x) as.character(toJSON(x)))

data.frame(
    v = names(jsonlist),
    json = unlist(jsonlist, use.names=FALSE),
    stringsAsFactors = FALSE
)

Desired Result

      v                                                                                  json
1 blue [{"w":"P","x":2,"y":2,"z":2},{"w":"B","x":2,"y":1,"z":3},{"w":"E","x":2,"y":1,"z":2}]
2  red [{"w":"N","x":1,"y":1,"z":2},{"w":"H","x":1,"y":2,"z":3},{"w":"K","x":2,"y":2,"z":3}]

Upvotes: 4

Views: 288

Answers (2)

bramtayl
bramtayl

Reputation: 4024

I'm still not convinced what you're doing makes sense, but:

dat %>%
  group_by(v) %>%
  do(json = select(., -v) %>% toJSON ) %>%
  mutate(json = unlist(json))

Upvotes: 1

Jota
Jota

Reputation: 17611

Using a data.table, you can group by v and pass .SD to toJSON:

library(data.table)
setDT(dat)
dat[, toJSON(.SD), by = v]
#      v                                                                                    V1
#1:  red [{"w":"N","x":1,"y":1,"z":2},{"w":"H","x":1,"y":2,"z":3},{"w":"K","x":2,"y":2,"z":3}]
#2: blue [{"w":"P","x":2,"y":2,"z":2},{"w":"B","x":2,"y":1,"z":3},{"w":"E","x":2,"y":1,"z":2}]

Upvotes: 6

Related Questions