user3646834
user3646834

Reputation:

Adding list columns to data tables in R returns inconsistent output - feature or bug?

I use $ to add a list column to a data.table in R. When the data.table has more than one row, this works as expected.

library(data.table)

dt2 <- data.table(x = 1:2)
dt2$y <- list(c(1, 1), c(2, 2))
dt2
#>    x   y
#> 1: 1 1,1
#> 2: 2 2,2

However, when the data.table has exactly one row, only the first element of the vector in the list is returned with a warning:

dt1 <- data.table(x = 1)
dt1$y <- list(c(1, 1))
#> Warning in `[<-.data.table`(x, j = name, value = value): Supplied 2 items
#> to be assigned to 1 items of column 'y' (1 unused)
dt1
#>    x y
#> 1: 1 1

This seems inconsistent. Is it a feature or a bug?

By contrast, doing the same thing with data.frames returns the expected output, regardless of the number of rows in the data.frame.

df1 <- data.frame(x = 1)
df1$y <- list(c(1, 1))
df1
#>   x    y
#> 1 1 1, 1

df2 <- data.frame(x = 1:2)
df2$y <- list(c(1, 1), c(2, 2))
df2
#>   x    y
#> 1 1 1, 1
#> 2 2 2, 2

Upvotes: 6

Views: 186

Answers (3)

Frank
Frank

Reputation: 66819

From vignette("datatable-intro"):

As long as j returns a list, each element of the list will become a column in the resulting data.table.

In your code...

dt1 <- data.table(x = 1)
dt1$y <- list(c(1, 1))

list(c(1, 1)) is treated as j, and its first element is a length-two vector, interpreted as a length-two column. Since your data.table only has one row, this yields a warning. As noted in Uwe's answer, the way around this is to wrap in an extra list(...).

vignette("datatable-reference-semantics") brings up a convenience feature:

T[, c("colA", "colB", ...) := list(valA, valB, ...)]

# when you have only one column to assign to you
# can drop the quotes and list(), for convenience
DT[, colA := valA]

And this works in your other code...

dt2 <- data.table(x = 1:2)
dt2$y <- list(c(1, 1), c(2, 2))

... but falls apart as you noticed in the special case of one row where valA should create a list column, so it's better to follow the advice in Uwe's answer: consistently wrapping in an extra list(...) or .(...).

Also see "What are the smaller syntax differences between data.frame and data.table?" in vignette("datatable-faq") for other differences with data frames.

Side note: There's little point using a data.table if you're going to assign like DT$y <- v. It kind of defeats the purpose of the package to avoid the syntax that supports modifying the table by reference, namely DT[, y := v]...

Upvotes: 2

Uwe
Uwe

Reputation: 42564

Besides Andre Elrico's suggestion to use the [[<- operator consistent behaviour can also be ensured if a double-nested list() is used. This will work for the $<- operator as well as data.table's := assignment operator.

2 row case

library(data.table)
dt2 <- data.table(x = 1:2)
dt2$y <- list(list(c(1, 1), c(2, 2)))
str(dt2)

dt2 <- data.table(x = 1:2)
dt2[, y := .(.(c(1, 1), c(2, 2)))]
str(dt2)

In both variants str(dt2) returns the same:

Classes ‘data.table’ and 'data.frame':    2 obs. of  2 variables:
 $ x: int  1 2
 $ y:List of 2
  ..$ : num  1 1
  ..$ : num  2 2
 - attr(*, ".internal.selfref")=<externalptr>

Please note that in data.table syntax list() can be abbreviated by .().

For comparison, here is the code which was used by the OP

dt2 <- data.table(x = 1:2)
dt2$y <- list(c(1, 1), c(2, 2))
str(dt2)

which creates the same structure

Classes ‘data.table’ and 'data.frame':    2 obs. of  2 variables:
 $ x: int  1 2
 $ y:List of 2
  ..$ : num  1 1
  ..$ : num  2 2
 - attr(*, ".internal.selfref")=<externalptr>

1 row case

dt1 <- data.table(x = 1)
dt1$y <- list(list(c(1, 1)))
str(dt1)

dt1 <- data.table(x = 1)
dt1[, y := .(.(c(1, 1)))]
str(dt1)

Again, the output of str(dt1) is identical for both code variants and also consistent with the 2 row case.

Classes ‘data.table’ and 'data.frame':    1 obs. of  2 variables:
 $ x: num 1
 $ y:List of 1
  ..$ : num  1 1
 - attr(*, ".internal.selfref")=<externalptr>

Upvotes: 3

Andre Elrico
Andre Elrico

Reputation: 11490

It's a strange behavior. Feel free to open an issue about it. I don't like the $ anyways due to such problems and its static character.

For lists I like [[]]

Get your consistent behavior like this:

dt1 <- data.table(x = 1)
dt1[["y"]]<-list(c(1, 1))

dt2 <- data.table(x = 1:2)
dt2[["y"]] <- list(c(1, 1), c(2, 2))

Upvotes: 2

Related Questions