Rbind-ing data.tables with NA values

Question

I have a big data.table with about 40 columns, and I need to add a record for which I only have 3 of the 40 columns (the rest will be just NA). To make a reproducible example:

require(data.table)
data(iris)
setDT(iris)

# this works (and is the expected result):
rbind(iris, list(6, NA, NA, NA, "test"))

The problem is I have 37+ empty columns (the data I want to input is in the 1st, 2nd and 37th columns of the variable). So, I need to rep some of the NAs. But if I try:

rbind(iris, list(6, rep(NA, 3), "test"))

It won't work (sizes are different). I could do

rbind(iris, list(c(6, rep(NA, 3), "test")))

But it will (obviously) coerce the whole first column to char. I've tried unlisting the list, inverting the list(c( sequence (it only accepts lists), and haven't found anything yet.

Please note that this is not a duplicate of the (several) posts about rbind data.tables, as I'm able to do that. What I haven't been able to, is to maintain proper data classes while doing it and using rep(NA, x).

Frank · Accepted Answer

You can do...

rbind(data.table(iris), c(list(6), logical(3), list("test")))

     Sepal.Length Sepal.Width Petal.Length Petal.Width   Species
  1:          5.1         3.5          1.4         0.2    setosa
  2:          4.9         3.0          1.4         0.2    setosa
  3:          4.7         3.2          1.3         0.2    setosa
  4:          4.6         3.1          1.5         0.2    setosa
  5:          5.0         3.6          1.4         0.2    setosa
 ---                                                            
147:          6.3         2.5          5.0         1.9 virginica
148:          6.5         3.0          5.2         2.0 virginica
149:          6.2         3.4          5.4         2.3 virginica
150:          5.9         3.0          5.1         1.8 virginica
151:          6.0          NA           NA          NA      test

logical(n) is the same as rep(NA, n). I wrapped iris in data.table() so rbindlist is used instead of rbind.data.frame and "test" is treated as a new factor level instead of an invalid level.

I think there are better ways to go, though, like...

newrow = setDT(iris[NA_integer_, ])
newrow[, `:=`(Sepal.Length = 6, Species = factor("test")) ]
rbind(data.table(iris), newrow)

# or

rbind(data.table(iris), list(Sepal.Length = 6, Species = "test"), fill=TRUE)

These approaches are clearer and don't require fiddling with column counting.

I prefer the newrow way, since it leaves a table I can inspect to review the data transformation.

Rbind-ing data.tables with NA values

Answers (2)

Related Questions