Reputation: 1458
How can I get do.call (namespace:base) and rbindlist (namespace:data.table) to behave the same. rbindlist eliminates factor levels while do.call does not. The following shows the issue
(dataList <- list(data.frame(f1=rep(c("a"), each=1),"c"=rnorm(2),"d"=rnorm(2)),
data.frame(f1=rep(c("b"), each=1),"c"=rnorm(2),"d"=rnorm(2))) )
(rbindlist.Data <- rbindlist(dataList)) # combines lists into ONE data.frame same as above
(do.call.Data <- do.call(rbind, dataList))
Upvotes: 0
Views: 323
Reputation: 21285
This behaviour has been fixed in version 1.8.9 of data.table
. You can download the latest version from R-forge or wait for 1.9.0 to hit CRAN.
From NEWS :
BUG FIXES
- rbindlist() now binds factor columns correctly, #2650.
Upvotes: 7
Reputation: 2416
It's true that rbindlist
doesn't deal well with factors.
Notice that the internal representation of "a" in dataList[[1]]$f1
and the internal representation of "b" in dataList[[2]]$f1
are both 1
; verify this using str(dataList)
. Unfortunately, rbindlist
will combine the internal representations; verify this using str(rbindlist.Data)
.
The solution is to rbindlist
character columns, and not factor columns, unless you're sure the factor columns use exactly the same factor representation (with the same levels and labels). One way to do this is to use data.table
consistently:
(dataList <- list(data.table(f1=rep(c("a"), each=1),"c"=rnorm(2),"d"=rnorm(2)),
data.table(f1=rep(c("b"), each=1),"c"=rnorm(2),"d"=rnorm(2))) )
(rbindlist.Data <- rbindlist(dataList))
produces the desired result, because data.table
won't convert strings to factors.
You could use your original code with stringsAsFactors = FALSE
(either in the data.frame
call or using options
). I wouldn't recommend this, though, as there's no harm (and much benefit) in using data.table
from the beginning.
If you aren't making the data.frame
yourself, you'll have to convert the column types. It's not hard with a data.table
call; see Convert column classes in data.table.
Upvotes: 4