R. Zhu
R. Zhu

Reputation: 415

Expand list column of data.tables

I have a data.table with a list column, where each element is a data.table:

dt <- data.table(id = c(1, 1, 2),
                 var = list(data.table(a = c(1, 2), b = c(3, 4)),
                            data.table(a = c(5, 6), b = c(7, 8)),
                            data.table(a = 9, b = 10)))

dt
# id             var
# 1:  1 <data.table>
# 2:  1 <data.table>
# 3:  2 <data.table>

Now I want to "unlist" this structure to:

   a  b id
1: 1  3  1
2: 2  4  1
3: 5  7  1
4: 6  8  1
5: 9 10  2

I know how to expand the embedded data.table part with rbindlist, but just have no idea how to bind the flattened data.table with variable "id".

The original dataset is 30 million lines and with dozens of variables, so I would really appreciate if you would propose solution not only workable but also memory efficient.

Upvotes: 4

Views: 1453

Answers (1)

Frank
Frank

Reputation: 66819

In this case dt[, var[[1]], by=id] works. However, I use rbindlist as the OP mentioned:

dt[, r := as.character(.I) ]
res <- dt[, rbindlist(setNames(var, r), id="r")]

Then merge on r (rows of dt) if you really need any vars from there:

res[dt, on=.(r), `:=`(id = i.id)]

This is better than dt[, var[[1]], by=id] in a few ways:

  • rbindlist should be faster than something with a lot of by= groups.
  • If there are more vars in dt, all of them will have to end up in by=.
  • Probably, it is not necessary to carry over vars from dt at all, since they can always be grabbed from that table later and they take up a lot less memory there.

Upvotes: 6

Related Questions