Have function that returns a list create new columns and expand `data.table`

Question

I'm trying to create a data.table from a data.table that has more numerous rows than the input data.table.

dt <- data.table(
  A = sample(LETTERS, size = 1000, replace = TRUE),
  B = sample(1:1000, size = 1000, replace = TRUE),
  C = sample(1:1000, size = 1000, replace = TRUE),
  D = sample(1:1000, size = 1000, replace = TRUE)
)

dt.fun <- function(b, c, d) {
  bcd <- b + c + d
  bc <- b + c
  cd <- b + d
  return(list(bcd, bc, cd))
}

The results I get back from dt[, .(dt.fun(B, C, D))), by = c('A') look like below:

   A                                V1
 1: T 1294,1827,1965,1476,1680,2494,...
 2: T  602, 837,1215, 907,1473,1505,...
 3: T 1156,1550,1385,1240,1007,1751,...
 4: B 1666,1894,1696,1683,2276,1530,...
 5: B 1365,1078,1398, 830,1483, 821,...
 6: B  981,1151,1082,1677,1391,1475,...
 7: K 1227,1381,2173,1156,1269,1209,...
 8: K  681,1066,1212, 662, 954, 999,...
 9: K  814, 499,1368,1012, 933, 264,...
10: C  625,2275,1019,1449,1115,2043,...
11: C  149,1532, 927,1230, 712,1652,...
12: C  617,1627, 103, 850, 988,1067,...
13: J 1122,2048,2347,2013,1974,1146,...
14: J  298,1158,1525,1119,1362,1106,...
15: J  908,1350,1692,1055,1181, 654,...
16: E 1918,1661,1684,1378,2442,1262,...
17: E 1198, 817, 899,1340,1578, 741,...
18: E 1025,1392, 949, 808,1826, 583,...
19: M 1818,1915,1716,1083,1524,1402,...
20: M 1305, 973, 871, 993, 966, 574,...
21: M 1475,1196, 926, 918,1042,1089,...
22: U 1718,1962,1880, 564, 726,1249,...
23: U 1371,1162,1365, 430, 147, 909,...
24: U 1250,1183,1019, 246, 722, 651,...
25: G 2405,2419, 632,1753,1993,1803,...
26: G 1797,1498, 307,1325,1422,1184,...
27: G 1582,1682, 488,1335,1015,1518,...
28: P 1889,1839,1445,1603,1034,1161,...
29: P 1130,1271, 789,1556, 906, 432,...
30: P 1671,1304, 846, 818, 819,1061,...

What am I doing wrong and how can I get the function to return each element of the list as a separate column?

r2evans · Accepted Answer

It isn't clear what your expected output is, but I'll take a few guesses.

First, reproducible data, I'll use a random seed:

set.seed(42)
dt <- data.table(
  A = sample(LETTERS, size = 1000, replace = TRUE),
  B = sample(1:1000, size = 1000, replace = TRUE),
  C = sample(1:1000, size = 1000, replace = TRUE),
  D = sample(1:1000, size = 1000, replace = TRUE)
)
head(dt,3)
#         A     B     C     D
#       
# 1:      Q   344   108   385
# 2:      E   765   477   526
# 3:      A   203   667   425

From here, your dt.fun function is returning a list, so we don't need to encapsulate its call in .( ... ), we can use it straight-up.

dt[, dt.fun(B, C, D), by = .(A) ]
#            A    V1    V2    V3
#          
#    1:      Q   837   452   729
#    2:      Q  1541   770  1529
#    3:      Q  1242   656   693
#    4:      Q   451   338   127
#    5:      Q  2006  1071  1105
#    6:      Q  1831  1147   869
#    7:      Q  1325   996  1237
#    8:      Q  1860  1385   937
#    9:      Q  1334  1023   748
#   10:      Q  1403   983   539
#   ---                         
#  991:      S  1587  1189   587
#  992:      S  2039  1063  1431
#  993:      S  1485   532  1449
#  994:      S  1976   985  1010
#  995:      S  2143  1444  1404
#  996:      S  1824  1394  1132
#  997:      S  1619  1126  1141
#  998:      S   863   721   236
#  999:      S  1496   763  1187
# 1000:      S   770   193   741

Since your function does not name the columns and we did not assign them, they were assigned "V1", etc. We can assign them by either changing your function to end with

  return(list(X=bcd, Y=bc, Z=cd))

or provide them in the assignment:

head(dt[, setNames(dt.fun(B, C, D), c("X", "Y", "Z")), by = .(A) ], 3)
#         A     X     Y     Z
#       
# 1:      Q   837   452   729
# 2:      Q  1541   770  1529
# 3:      Q  1242   656   693

If you are hoping to append the columns, though, and preserve the original "B" and such, then we can do:

head(dt[, c("X", "Y", "Z") := dt.fun(B, C, D), by = .(A) ], 3)
#         A     B     C     D     X     Y     Z
#          
# 1:      Q   344   108   385   837   452   729
# 2:      E   765   477   526  1768  1242  1291
# 3:      A   203   667   425  1295   870   628

or borrowing from Waldi's since-deleted answer, we can c(.SD, ...) to append it. I'll combine that with my setNames(.) recommendation, not needed if you change the names in your dt.fun function.

head(dt[, c(.SD, setNames(dt.fun(B, C, D), c("X", "Y", "Z"))), by = .(A) ], 3)
#         A     B     C     D     X     Y     Z
#          
# 1:      Q   344   108   385   837   452   729
# 2:      Q   758    12   771  1541   770  1529
# 3:      Q   107   549   586  1242   656   693

Have function that returns a list create new columns and expand `data.table`

Answers (2)

Related Questions