Reputation: 893
I'm trying to create a data.table
from a data.table
that has more numerous rows than the input data.table
.
dt <- data.table(
A = sample(LETTERS, size = 1000, replace = TRUE),
B = sample(1:1000, size = 1000, replace = TRUE),
C = sample(1:1000, size = 1000, replace = TRUE),
D = sample(1:1000, size = 1000, replace = TRUE)
)
dt.fun <- function(b, c, d) {
bcd <- b + c + d
bc <- b + c
cd <- b + d
return(list(bcd, bc, cd))
}
The results I get back from dt[, .(dt.fun(B, C, D))), by = c('A')
look like below:
A V1
1: T 1294,1827,1965,1476,1680,2494,...
2: T 602, 837,1215, 907,1473,1505,...
3: T 1156,1550,1385,1240,1007,1751,...
4: B 1666,1894,1696,1683,2276,1530,...
5: B 1365,1078,1398, 830,1483, 821,...
6: B 981,1151,1082,1677,1391,1475,...
7: K 1227,1381,2173,1156,1269,1209,...
8: K 681,1066,1212, 662, 954, 999,...
9: K 814, 499,1368,1012, 933, 264,...
10: C 625,2275,1019,1449,1115,2043,...
11: C 149,1532, 927,1230, 712,1652,...
12: C 617,1627, 103, 850, 988,1067,...
13: J 1122,2048,2347,2013,1974,1146,...
14: J 298,1158,1525,1119,1362,1106,...
15: J 908,1350,1692,1055,1181, 654,...
16: E 1918,1661,1684,1378,2442,1262,...
17: E 1198, 817, 899,1340,1578, 741,...
18: E 1025,1392, 949, 808,1826, 583,...
19: M 1818,1915,1716,1083,1524,1402,...
20: M 1305, 973, 871, 993, 966, 574,...
21: M 1475,1196, 926, 918,1042,1089,...
22: U 1718,1962,1880, 564, 726,1249,...
23: U 1371,1162,1365, 430, 147, 909,...
24: U 1250,1183,1019, 246, 722, 651,...
25: G 2405,2419, 632,1753,1993,1803,...
26: G 1797,1498, 307,1325,1422,1184,...
27: G 1582,1682, 488,1335,1015,1518,...
28: P 1889,1839,1445,1603,1034,1161,...
29: P 1130,1271, 789,1556, 906, 432,...
30: P 1671,1304, 846, 818, 819,1061,...
What am I doing wrong and how can I get the function to return each element of the list as a separate column?
Upvotes: 1
Views: 127
Reputation: 160447
It isn't clear what your expected output is, but I'll take a few guesses.
First, reproducible data, I'll use a random seed:
set.seed(42)
dt <- data.table(
A = sample(LETTERS, size = 1000, replace = TRUE),
B = sample(1:1000, size = 1000, replace = TRUE),
C = sample(1:1000, size = 1000, replace = TRUE),
D = sample(1:1000, size = 1000, replace = TRUE)
)
head(dt,3)
# A B C D
# <char> <int> <int> <int>
# 1: Q 344 108 385
# 2: E 765 477 526
# 3: A 203 667 425
From here, your dt.fun
function is returning a list
, so we don't need to encapsulate its call in .( ... )
, we can use it straight-up.
dt[, dt.fun(B, C, D), by = .(A) ]
# A V1 V2 V3
# <char> <int> <int> <int>
# 1: Q 837 452 729
# 2: Q 1541 770 1529
# 3: Q 1242 656 693
# 4: Q 451 338 127
# 5: Q 2006 1071 1105
# 6: Q 1831 1147 869
# 7: Q 1325 996 1237
# 8: Q 1860 1385 937
# 9: Q 1334 1023 748
# 10: Q 1403 983 539
# ---
# 991: S 1587 1189 587
# 992: S 2039 1063 1431
# 993: S 1485 532 1449
# 994: S 1976 985 1010
# 995: S 2143 1444 1404
# 996: S 1824 1394 1132
# 997: S 1619 1126 1141
# 998: S 863 721 236
# 999: S 1496 763 1187
# 1000: S 770 193 741
Since your function does not name the columns and we did not assign them, they were assigned "V1"
, etc. We can assign them by either changing your function to end with
return(list(X=bcd, Y=bc, Z=cd))
or provide them in the assignment:
head(dt[, setNames(dt.fun(B, C, D), c("X", "Y", "Z")), by = .(A) ], 3)
# A X Y Z
# <char> <int> <int> <int>
# 1: Q 837 452 729
# 2: Q 1541 770 1529
# 3: Q 1242 656 693
If you are hoping to append the columns, though, and preserve the original "B"
and such, then we can do:
head(dt[, c("X", "Y", "Z") := dt.fun(B, C, D), by = .(A) ], 3)
# A B C D X Y Z
# <char> <int> <int> <int> <int> <int> <int>
# 1: Q 344 108 385 837 452 729
# 2: E 765 477 526 1768 1242 1291
# 3: A 203 667 425 1295 870 628
or borrowing from Waldi's since-deleted answer, we can c(.SD, ...)
to append it. I'll combine that with my setNames(.)
recommendation, not needed if you change the names in your dt.fun
function.
head(dt[, c(.SD, setNames(dt.fun(B, C, D), c("X", "Y", "Z"))), by = .(A) ], 3)
# A B C D X Y Z
# <char> <int> <int> <int> <int> <int> <int>
# 1: Q 344 108 385 837 452 729
# 2: Q 758 12 771 1541 770 1529
# 3: Q 107 549 586 1242 656 693
Upvotes: 3
Reputation: 389055
You can create a vector of new column names and assign the output to it.
library(data.table)
dt.fun <- function(b, c, d) {
bcd <- b + c + d
bc <- b + c
cd <- b + d
return(list(bcd, bc, cd))
}
new_cols = c('bcd', 'bc', 'cd')
dt[, (new_cols) := dt.fun(B, C, D), A]
dt
# A B C D bcd bc cd
# 1: F 789 12 941 1742 801 1730
# 2: O 701 51 275 1027 752 976
# 3: T 996 179 973 2148 1175 1969
# 4: T 407 392 88 887 799 495
# 5: Z 476 311 489 1276 787 965
# ---
# 996: D 266 841 120 1227 1107 386
# 997: Q 229 218 908 1355 447 1137
# 998: E 196 326 604 1126 522 800
# 999: W 882 106 488 1476 988 1370
#1000: J 580 180 679 1439 760 1259
Upvotes: 2