Reputation: 1027
Due to time constraints, I've decided to use data tables in my code instead of data frames, as they are much faster. However, I still want the functionality of data frames. I need to merge two data tables, conserving all values (like setting all=TRUE in merge).
Some example code:
> x1 = data.frame(index = 1:10)
> y1 = data.frame(index = c(2,4,6), weight = c(.2, .5, .3))
> x1
index
1 1
2 2
3 3
4 4
5 5
6 6
7 7
8 8
9 9
10 10
> y1
index weight
1 2 0.2
2 4 0.5
3 6 0.3
> merge(x,y, all=TRUE)
index weight
[1,] 1 NA
[2,] 2 1
[3,] 3 NA
[4,] 4 2
[5,] 5 NA
[6,] 6 3
[7,] 7 NA
[8,] 8 NA
[9,] 9 NA
[10,] 10 NA
Now can I do a similar thing with data tables? (The NA's don't necessarily have to stay, I change them to 0's anyways).
> x2 = data.table(index = 1:10, key ="index")
> y2 = data.table(index = c(2,4,6), weight= c(.3,.5,.2))
I know you can merge, but I also know that there is a faster way.
Upvotes: 12
Views: 5628
Reputation: 6723
I use a function like:
mergefast<-function(x,y,by.x,by.y,all) {
x_dt<-data.table(x)
y2<-y
for (i in 1:length(by.y)) names(y2)[grep(by.y[i],names(y2))]<-by.x[i]
y_dt<-data.table(y2)
setkeyv(x_dt,by.x)
setkeyv(y_dt,by.x)
as.data.frame(merge(x_dt,y_dt,by=by.x,all=all))
}
which can be used in your example as:
mergefast(x1,y1,by.x="index",by.y="index",all=T)
It's a bit lacking in features that merge
has, e.g. by
, all.x
, all.y
, but these can be easily incorporated.
Upvotes: 1
Reputation: 7475
so following on from Translating SQL joins on foreign keys to R data.table syntax
x2 = data.table(index = 1:10, key ="index")
y2 = data.table(index = c(2,4,6), weight= c(.3,.5,.2),key="index")
y2[J(x2$index)]
Upvotes: 8