Reputation: 13548
I have a large (millions of rows) melted data.table
with the usual melt
-style unrolling in the variable
and value
columns. I need to cast the table in wide form (rolling the variables up). The problem is that the data table also has a list column called data
, which I need to preserve. This makes it impossible to use reshape2
because dcast
cannot deal with non-atomic columns. Therefore, I need to do the rolling up myself.
The answer from a previous question about working with melted data tables does not apply here because of the list column.
I am not satisfied with the solution I've come up with. I'm looking for suggestions for a simpler/faster implementation.
x <- LETTERS[1:3]
dt <- data.table(
x=rep(x, each=2),
y='d',
data=list(list(), list(), list(), list(), list(), list()),
variable=rep(c('var.1', 'var.2'), 3),
value=seq(1,6)
)
# Column template set up
list_template <- Reduce(
function(l, col) { l[[col]] <- col; l },
unique(dt$variable),
list())
# Expression set up
q <- substitute({
l <- lapply(
list_template,
function(col) .SD[variable==as.character(col)]$value)
l$data = .SD[1,]$data
l
}, list(list_template=list_template))
# Roll up
dt[, eval(q), by=list(x, y)]
x y var.1 var.2 data
1: A d 1 2 <list>
2: B d 3 4 <list>
3: C d 5 6 <list>
Upvotes: 7
Views: 3103
Reputation: 42592
This old question piqued my curiosity as data.table
has been improved sigificantly since 2013.
However, even with data.table
version 1.11.4
dcast(dt, x + y + data ~ variable)
still returns an error
Columns specified in formula can not be of type list
The workaround follows the general outline of jonsedar's answer :
data
grouped by x
and y
x
and y
but uses the features of the actual data.table
syntax, e.g., the on
parameter:
dcast(dt, x + y ~ variable)[
dt[, .(data = .(first(data))), by = .(x, y)], on = .(x, y)]
x y var.1 var.2 data 1: A d 1 2 <list> 2: B d 3 4 <list> 3: C d 5 6 <list>
The list column data
is aggregated by taking the first element. This is in line with OP's code line
l$data = .SD[1,]$data
which also picks the first element.
Upvotes: 2
Reputation: 314
I have somewhat cheating method that might do the trick - importantly, I assume that each x,y,list combination is unique! If not, please disregard.
I'm going to create two separate datatables, the first which is dcasted without the data list objects, and the second which has only the unique data list objects and a key. Then just merge them together to get the desired result.
require(data.table)
require(stringr)
require(reshape2)
x <- LETTERS[1:3]
dt <- data.table(
x=rep(x, each=2),
y='d',
data=list(list("a","b"), list("c","d")),
variable=rep(c('var.1', 'var.2'), 3),
value=seq(1,6)
)
# First create the dcasted datatable without the pesky list objects:
dt_nolist <- dt[,list(x,y,variable,value)]
dt_dcast <- data.table(dcast(dt_nolist,x+y~variable,value.var="value")
,key=c("x","y"))
# Second: create a datatable with only unique "groups" of x,y, list
dt_list <- dt[,list(x,y,data)]
# Rows are duplicated so I'd like to use unique() to get rid of them, but
# unique() doesn't work when there's list objects in the data.table.
# Instead so I cheat by applying a value to each row within an x,y "group"
# that is unique within EACH group, but present within EVERY group.
# Then just simply subselect based on that unique value.
# I've chosen rank(), but no doubt there's other options
dt_list <- dt_list[,rank:=rank(str_c(x,y),ties.method="first"),by=str_c(x,y)]
# now keep only one row per x,y "group"
dt_list <- dt_list[rank==1]
setkeyv(dt_list,c("x","y"))
# drop the rank since we no longer need it
dt_list[,rank:=NULL]
# Finally just merge back together
dt_final <- merge(dt_dcast,dt_list)
Upvotes: 1