Reputation: 415
I have a data.table
with a list column, where each element is a data.table
:
dt <- data.table(id = c(1, 1, 2),
var = list(data.table(a = c(1, 2), b = c(3, 4)),
data.table(a = c(5, 6), b = c(7, 8)),
data.table(a = 9, b = 10)))
dt
# id var
# 1: 1 <data.table>
# 2: 1 <data.table>
# 3: 2 <data.table>
Now I want to "unlist" this structure to:
a b id
1: 1 3 1
2: 2 4 1
3: 5 7 1
4: 6 8 1
5: 9 10 2
I know how to expand the embedded data.table
part with rbindlist
, but just have no idea how to bind the flattened data.table
with variable "id".
The original dataset is 30 million lines and with dozens of variables, so I would really appreciate if you would propose solution not only workable but also memory efficient.
Upvotes: 4
Views: 1453
Reputation: 66819
In this case dt[, var[[1]], by=id]
works. However, I use rbindlist
as the OP mentioned:
dt[, r := as.character(.I) ]
res <- dt[, rbindlist(setNames(var, r), id="r")]
Then merge on r
(rows of dt
) if you really need any vars from there:
res[dt, on=.(r), `:=`(id = i.id)]
This is better than dt[, var[[1]], by=id]
in a few ways:
rbindlist
should be faster than something with a lot of by=
groups.dt
, all of them will have to end up in by=
.dt
at all, since they can always be grabbed from that table later and they take up a lot less memory there.Upvotes: 6