Expand list column of data.tables

Question

I have a data.table with a list column, where each element is a data.table:

dt <- data.table(id = c(1, 1, 2),
                 var = list(data.table(a = c(1, 2), b = c(3, 4)),
                            data.table(a = c(5, 6), b = c(7, 8)),
                            data.table(a = 9, b = 10)))

dt
# id             var
# 1:  1 
# 2:  1 
# 3:  2

Now I want to "unlist" this structure to:

I know how to expand the embedded data.table part with rbindlist, but just have no idea how to bind the flattened data.table with variable "id".

The original dataset is 30 million lines and with dozens of variables, so I would really appreciate if you would propose solution not only workable but also memory efficient.

Frank · Accepted Answer

In this case dt[, var[[1]], by=id] works. However, I use rbindlist as the OP mentioned:

dt[, r := as.character(.I) ]
res <- dt[, rbindlist(setNames(var, r), id="r")]

Then merge on r (rows of dt) if you really need any vars from there:

res[dt, on=.(r), `:=`(id = i.id)]

This is better than dt[, var[[1]], by=id] in a few ways:

rbindlist should be faster than something with a lot of by= groups.
If there are more vars in dt, all of them will have to end up in by=.
Probably, it is not necessary to carry over vars from dt at all, since they can always be grabbed from that table later and they take up a lot less memory there.

Expand list column of data.tables

Answers (1)

Related Questions