Reputation: 624
I am having trouble with a reasonable sized data.table containing 30 or so columns: (note I am using dummy values below)
Using rbindlist(list(dat, dat2))
to add a new data.table with same fields with another 50000 rows produces an incorrect new master data.table.
Is there a simple and fast solution to add new rows to a data.table where the column fields all match?
To simplify, I have created a dummy dataset.
master.df <- data.frame(id = letters[1:10],
mpg = sample(c(20,22), 10, replace = TRUE),
cyl = sample(c(4,8), 10, replace = TRUE),
disp = sample(c(160,300), 10, replace = TRUE),
factor = sample(c(TRUE, FALSE), 10, replace = TRUE),
hp = sample(c(20,22), 10, replace = TRUE))
newTable.df <- data.frame(id = letters[11:15],
mpg = sample(c(20,22), 5, replace = TRUE),
cyl = sample(c(4,8), 5, replace = TRUE),
disp = sample(c(160,300), 5, replace = TRUE),
factor = sample(c(TRUE, FALSE), 10, replace = TRUE),
hp = sample(c(20,22), 5, replace = TRUE))
library(data.table)
dat = as.data.table(master.df)
dat2 = as.data.table(newTable.df)
Using rbind(dat,dat2)
outputs duplicate dat2. (expected should be total 15 rows)
I read forums for better solutions and something came up with rbindlist
but that does not look like it does the trick either. Same output as rbind
Is there a fast solution that binds dat2 to dat without the duplication?
id mpg cyl disp factor hp
1: a 22 8 300 FALSE 20
2: b 20 8 300 TRUE 20
3: c 20 8 160 FALSE 20
4: d 20 4 300 TRUE 22
5: e 22 4 160 FALSE 22
6: f 22 4 160 TRUE 22
7: g 20 8 160 FALSE 20
8: h 22 4 300 FALSE 20
9: i 22 4 160 FALSE 20
10: j 22 8 160 TRUE 22
11: k 22 8 160 FALSE 20
12: l 22 8 160 TRUE 20
13: m 20 8 300 TRUE 20
14: n 22 4 300 FALSE 20
15: o 20 8 160 FALSE 20
16: k 22 8 160 FALSE 20
17: l 22 8 160 FALSE 20
18: m 20 8 300 FALSE 20
19: n 22 4 300 TRUE 20
20: o 20 8 160 TRUE 20
Upvotes: 0
Views: 6981
Reputation: 6560
Your problem is that when creating newTable.df
you have the following line:
factor = sample(c(TRUE, FALSE), 10, replace = TRUE)
This causes the resulting table to have 10 rows (instead of 5 as you intended). Once you change this 10 into 5, the dat2
data.table will have 5 rows, and rbind(dat, dat2)
will have 15 rows.
Upvotes: 1