sempervent
sempervent

Reputation: 893

Have function that returns a list create new columns and expand `data.table`

I'm trying to create a data.table from a data.table that has more numerous rows than the input data.table.

dt <- data.table(
  A = sample(LETTERS, size = 1000, replace = TRUE),
  B = sample(1:1000, size = 1000, replace = TRUE),
  C = sample(1:1000, size = 1000, replace = TRUE),
  D = sample(1:1000, size = 1000, replace = TRUE)
)

dt.fun <- function(b, c, d) {
  bcd <- b + c + d
  bc <- b + c
  cd <- b + d
  return(list(bcd, bc, cd))
}

The results I get back from dt[, .(dt.fun(B, C, D))), by = c('A') look like below:

   A                                V1
 1: T 1294,1827,1965,1476,1680,2494,...
 2: T  602, 837,1215, 907,1473,1505,...
 3: T 1156,1550,1385,1240,1007,1751,...
 4: B 1666,1894,1696,1683,2276,1530,...
 5: B 1365,1078,1398, 830,1483, 821,...
 6: B  981,1151,1082,1677,1391,1475,...
 7: K 1227,1381,2173,1156,1269,1209,...
 8: K  681,1066,1212, 662, 954, 999,...
 9: K  814, 499,1368,1012, 933, 264,...
10: C  625,2275,1019,1449,1115,2043,...
11: C  149,1532, 927,1230, 712,1652,...
12: C  617,1627, 103, 850, 988,1067,...
13: J 1122,2048,2347,2013,1974,1146,...
14: J  298,1158,1525,1119,1362,1106,...
15: J  908,1350,1692,1055,1181, 654,...
16: E 1918,1661,1684,1378,2442,1262,...
17: E 1198, 817, 899,1340,1578, 741,...
18: E 1025,1392, 949, 808,1826, 583,...
19: M 1818,1915,1716,1083,1524,1402,...
20: M 1305, 973, 871, 993, 966, 574,...
21: M 1475,1196, 926, 918,1042,1089,...
22: U 1718,1962,1880, 564, 726,1249,...
23: U 1371,1162,1365, 430, 147, 909,...
24: U 1250,1183,1019, 246, 722, 651,...
25: G 2405,2419, 632,1753,1993,1803,...
26: G 1797,1498, 307,1325,1422,1184,...
27: G 1582,1682, 488,1335,1015,1518,...
28: P 1889,1839,1445,1603,1034,1161,...
29: P 1130,1271, 789,1556, 906, 432,...
30: P 1671,1304, 846, 818, 819,1061,...

What am I doing wrong and how can I get the function to return each element of the list as a separate column?

Upvotes: 1

Views: 127

Answers (2)

r2evans
r2evans

Reputation: 160447

It isn't clear what your expected output is, but I'll take a few guesses.

First, reproducible data, I'll use a random seed:

set.seed(42)
dt <- data.table(
  A = sample(LETTERS, size = 1000, replace = TRUE),
  B = sample(1:1000, size = 1000, replace = TRUE),
  C = sample(1:1000, size = 1000, replace = TRUE),
  D = sample(1:1000, size = 1000, replace = TRUE)
)
head(dt,3)
#         A     B     C     D
#    <char> <int> <int> <int>
# 1:      Q   344   108   385
# 2:      E   765   477   526
# 3:      A   203   667   425

From here, your dt.fun function is returning a list, so we don't need to encapsulate its call in .( ... ), we can use it straight-up.

dt[, dt.fun(B, C, D), by = .(A) ]
#            A    V1    V2    V3
#       <char> <int> <int> <int>
#    1:      Q   837   452   729
#    2:      Q  1541   770  1529
#    3:      Q  1242   656   693
#    4:      Q   451   338   127
#    5:      Q  2006  1071  1105
#    6:      Q  1831  1147   869
#    7:      Q  1325   996  1237
#    8:      Q  1860  1385   937
#    9:      Q  1334  1023   748
#   10:      Q  1403   983   539
#   ---                         
#  991:      S  1587  1189   587
#  992:      S  2039  1063  1431
#  993:      S  1485   532  1449
#  994:      S  1976   985  1010
#  995:      S  2143  1444  1404
#  996:      S  1824  1394  1132
#  997:      S  1619  1126  1141
#  998:      S   863   721   236
#  999:      S  1496   763  1187
# 1000:      S   770   193   741

Since your function does not name the columns and we did not assign them, they were assigned "V1", etc. We can assign them by either changing your function to end with

  return(list(X=bcd, Y=bc, Z=cd))

or provide them in the assignment:

head(dt[, setNames(dt.fun(B, C, D), c("X", "Y", "Z")), by = .(A) ], 3)
#         A     X     Y     Z
#    <char> <int> <int> <int>
# 1:      Q   837   452   729
# 2:      Q  1541   770  1529
# 3:      Q  1242   656   693

If you are hoping to append the columns, though, and preserve the original "B" and such, then we can do:

head(dt[, c("X", "Y", "Z") := dt.fun(B, C, D), by = .(A) ], 3)
#         A     B     C     D     X     Y     Z
#    <char> <int> <int> <int> <int> <int> <int>
# 1:      Q   344   108   385   837   452   729
# 2:      E   765   477   526  1768  1242  1291
# 3:      A   203   667   425  1295   870   628

or borrowing from Waldi's since-deleted answer, we can c(.SD, ...) to append it. I'll combine that with my setNames(.) recommendation, not needed if you change the names in your dt.fun function.

head(dt[, c(.SD, setNames(dt.fun(B, C, D), c("X", "Y", "Z"))), by = .(A) ], 3)
#         A     B     C     D     X     Y     Z
#    <char> <int> <int> <int> <int> <int> <int>
# 1:      Q   344   108   385   837   452   729
# 2:      Q   758    12   771  1541   770  1529
# 3:      Q   107   549   586  1242   656   693

Upvotes: 3

Ronak Shah
Ronak Shah

Reputation: 389055

You can create a vector of new column names and assign the output to it.

library(data.table)

dt.fun <- function(b, c, d) {
  bcd <- b + c + d
  bc <- b + c
  cd <- b + d
  return(list(bcd, bc, cd))
}

new_cols = c('bcd', 'bc', 'cd')

dt[, (new_cols) := dt.fun(B, C, D), A]
dt

#      A   B   C   D  bcd   bc   cd
#   1: F 789  12 941 1742  801 1730
#   2: O 701  51 275 1027  752  976
#   3: T 996 179 973 2148 1175 1969
#   4: T 407 392  88  887  799  495
#   5: Z 476 311 489 1276  787  965
#  ---                             
# 996: D 266 841 120 1227 1107  386
# 997: Q 229 218 908 1355  447 1137
# 998: E 196 326 604 1126  522  800
# 999: W 882 106 488 1476  988 1370
#1000: J 580 180 679 1439  760 1259

Upvotes: 2

Related Questions