vijucat
vijucat

Reputation: 2088

Why is class(.SD) on a data.table showing "data.frame"?

colnames() seems to be enumerating all columns per group as expected, but class() shows exactly two rows per group! And one of them is data.frame

> dt <- data.table("a"=1:3, "b"=1:3, "c"=1:3, "d"=1:3, "e"=1:3)

> dt[, class(.SD), by=a]
   x y z         V1
1: 1 1 1 data.table
2: 1 1 1 data.frame
3: 2 2 2 data.table
4: 2 2 2 data.frame
5: 3 3 3 data.table
6: 3 3 3 data.frame


> dt[, colnames(.SD), by=x]
    x y z V1
 1: 1 1 1  a
 2: 1 1 1  b
 3: 1 1 1  c
 4: 1 1 1  d
 5: 1 1 1  e
 6: 2 2 2  a
 7: 2 2 2  b
 8: 2 2 2  c
 9: 2 2 2  d
10: 2 2 2  e
11: 3 3 3  a
12: 3 3 3  b
13: 3 3 3  c
14: 3 3 3  d
15: 3 3 3  e

Upvotes: 2

Views: 1096

Answers (2)

vijucat
vijucat

Reputation: 2088

Every data.table is a data.frame, and shows both applicable classes when asked:

> class(dt)
[1] "data.table" "data.frame"

This applies to .SD, too, because .SD is a data table by definition (.SD is a data.table containing the Subset of x's Data for each group)

Upvotes: 1

jangorecki
jangorecki

Reputation: 16697

.SD stands for column Subset of Data.table, thus it is also a data.table object. And because data.table is a data.frame class(.SD) returns a length 2 character vector for each group, making it a little bit confusing if you expect single row for each group.
To avoid such confusion you can just wrap results into another list, enforcing single row for each group.

library(data.table)
dt <- data.table(x=1:3, y=1:3)
dt[, .(class = list(class(.SD))), by = x]
#   x                 class
#1: 1 data.table,data.frame
#2: 2 data.table,data.frame
#3: 3 data.table,data.frame

Upvotes: 2

Related Questions