shadow
shadow

Reputation: 22293

rbindlist for factors with missing levels

I have several data.tables that I would like to rbindlist. The tables contain factors with (possibly missing) levels. Then rbindlist(...) behaves differently from do.call(rbind(...)):

dt1 <- data.table(x=factor(c("a", "b"), levels=letters))

rbindlist(list(dt1, dt1))[,x] 
## [1] a b a b
## Levels: a b

do.call(rbind, list(dt1, dt1))[,x]
## [1] a b a b
## Levels: a b c d e f g h i j k l m n o p q r s t u v w x y z

If I want to keep the levels, do I have tor resort to rbind or is there a data.table way?

Upvotes: 6

Views: 408

Answers (2)

eddi
eddi

Reputation: 49448

Thanks for pointing out this problem. As of version 1.8.11 it has been fixed:

dt1 <- data.table(x=factor(c("a", "b"), levels=letters))

rbindlist(list(dt1, dt1))[,x]
#[1] a b a b
#Levels: a b c d e f g h i j k l m n o p q r s t u v w x y z

Upvotes: 2

agstudy
agstudy

Reputation: 121568

I guess rbindlist is faster because it doesn't do the checking of do.call(rbind.data.frame,...)

Why not to set the levels after binding?

    Dt <- rbindlist(list(dt1, dt1)) 
    setattr(Dt$x,"levels",letters)  ## set attribute without a copy

from the ?setattr:

setattr() is useful in many situations to set attributes by reference and can be used on any object or part of an object, not just data.tables.

Upvotes: 4

Related Questions