Reputation:
I use $
to add a list column to a data.table
in R. When the data.table
has more than one row, this works as expected.
library(data.table)
dt2 <- data.table(x = 1:2)
dt2$y <- list(c(1, 1), c(2, 2))
dt2
#> x y
#> 1: 1 1,1
#> 2: 2 2,2
However, when the data.table
has exactly one row, only the first element of the vector in the list is returned with a warning:
dt1 <- data.table(x = 1)
dt1$y <- list(c(1, 1))
#> Warning in `[<-.data.table`(x, j = name, value = value): Supplied 2 items
#> to be assigned to 1 items of column 'y' (1 unused)
dt1
#> x y
#> 1: 1 1
This seems inconsistent. Is it a feature or a bug?
By contrast, doing the same thing with data.frame
s returns the expected output, regardless of the number of rows in the data.frame
.
df1 <- data.frame(x = 1)
df1$y <- list(c(1, 1))
df1
#> x y
#> 1 1 1, 1
df2 <- data.frame(x = 1:2)
df2$y <- list(c(1, 1), c(2, 2))
df2
#> x y
#> 1 1 1, 1
#> 2 2 2, 2
Upvotes: 6
Views: 186
Reputation: 66819
From vignette("datatable-intro")
:
As long as
j
returns a list, each element of the list will become a column in the resulting data.table.
In your code...
dt1 <- data.table(x = 1)
dt1$y <- list(c(1, 1))
list(c(1, 1))
is treated as j
, and its first element is a length-two vector, interpreted as a length-two column. Since your data.table only has one row, this yields a warning. As noted in Uwe's answer, the way around this is to wrap in an extra list(...)
.
vignette("datatable-reference-semantics")
brings up a convenience feature:
T[, c("colA", "colB", ...) := list(valA, valB, ...)] # when you have only one column to assign to you # can drop the quotes and list(), for convenience DT[, colA := valA]
And this works in your other code...
dt2 <- data.table(x = 1:2)
dt2$y <- list(c(1, 1), c(2, 2))
... but falls apart as you noticed in the special case of one row where valA
should create a list column, so it's better to follow the advice in Uwe's answer: consistently wrapping in an extra list(...)
or .(...)
.
Also see "What are the smaller syntax differences between data.frame and data.table?" in vignette("datatable-faq")
for other differences with data frames.
Side note: There's little point using a data.table if you're going to assign like DT$y <- v
. It kind of defeats the purpose of the package to avoid the syntax that supports modifying the table by reference, namely DT[, y := v]
...
Upvotes: 2
Reputation: 42564
Besides Andre Elrico's suggestion to use the [[<-
operator consistent behaviour can also be ensured if a double-nested list()
is used. This will work for the $<-
operator as well as data.table
's :=
assignment operator.
library(data.table)
dt2 <- data.table(x = 1:2)
dt2$y <- list(list(c(1, 1), c(2, 2)))
str(dt2)
dt2 <- data.table(x = 1:2)
dt2[, y := .(.(c(1, 1), c(2, 2)))]
str(dt2)
In both variants str(dt2)
returns the same:
Classes ‘data.table’ and 'data.frame': 2 obs. of 2 variables: $ x: int 1 2 $ y:List of 2 ..$ : num 1 1 ..$ : num 2 2 - attr(*, ".internal.selfref")=<externalptr>
Please note that in data.table
syntax list()
can be abbreviated by .()
.
For comparison, here is the code which was used by the OP
dt2 <- data.table(x = 1:2)
dt2$y <- list(c(1, 1), c(2, 2))
str(dt2)
which creates the same structure
Classes ‘data.table’ and 'data.frame': 2 obs. of 2 variables: $ x: int 1 2 $ y:List of 2 ..$ : num 1 1 ..$ : num 2 2 - attr(*, ".internal.selfref")=<externalptr>
dt1 <- data.table(x = 1)
dt1$y <- list(list(c(1, 1)))
str(dt1)
dt1 <- data.table(x = 1)
dt1[, y := .(.(c(1, 1)))]
str(dt1)
Again, the output of str(dt1)
is identical for both code variants and also consistent with the 2 row case.
Classes ‘data.table’ and 'data.frame': 1 obs. of 2 variables: $ x: num 1 $ y:List of 1 ..$ : num 1 1 - attr(*, ".internal.selfref")=<externalptr>
Upvotes: 3
Reputation: 11490
It's a strange behavior. Feel free to open an issue about it. I don't like the $
anyways due to such problems and its static character.
For lists I like [[]]
Get your consistent behavior like this:
dt1 <- data.table(x = 1)
dt1[["y"]]<-list(c(1, 1))
dt2 <- data.table(x = 1:2)
dt2[["y"]] <- list(c(1, 1), c(2, 2))
Upvotes: 2