A.Benson
A.Benson

Reputation: 479

Make data table using matched values from two data tables using R

I have two data.tables(a and b) using R which are approx 10,000 rows each. Each data.tablecontains three columns, x, y, and z. The z values are unique in each data.table. Each data.table shares a number of identical x and y values. What I need to do, is create a new data.table (c) which contains only the rows from a, where the x and y values are identical to those in b.

Here is an example of what I need to achieve.

data table a

x    y      z
a    1    100
a    6    120
c    5    234
b    3    567
d    8    645
f    7    487

data table b

x    y     z
a    1    904
b    6    120
c    7    765
e    3    567
d    8    329
a    0    638

And data table c (the newly created data table) should look like this

x    y     z
a    1    100
d    8    645

I've looked here (Matching values between data frames based on overlapping dates), but it didn't quite get me to where I needed to be.

Here's the code to generate the example data.tables.

library(data.table)

x<-c("a","a","c","b","d","f")
y<-c(1,6,5,3,8,7)
z<-c(100,120,234,567,645,487)
a<-data.frame(x,y,z)
rm(x,y,z)

x<-c("a","b","c","e","d","a")
y<-c(1,6,7,3,8,0)
z<-c(904,120,765,567,329,638)
b<-data.frame(x,y,z)

setDT(a)
setDT(b)

Happy to take a suggestion that uses traditional data.frames

Thanks

Upvotes: 0

Views: 235

Answers (3)

nghauran
nghauran

Reputation: 6768

Try:

# Note that they should be different values of z for identical x and y
# The following options assume that you also want to keep this information i.e. z values 
# from a and b for matched columns x and y
dplyr::inner_join(a, b, by = c("x", "y")) # option 1
merge(a, b, by = c("x","y")) # option 2
# option 3 using DT
setkeyv(a, c("x","y"))
setkeyv(b, c("x","y"))
a[b, nomatch = 0]

Upvotes: 0

Jon Spring
Jon Spring

Reputation: 66520

Alternatively, a dplyr approach:

library(dplyr)
c <- a %>% semi_join(b, by = c("x", "y"))

> c
  x y   z
1 a 1 100
2 d 8 645

Upvotes: 0

DanY
DanY

Reputation: 6073

A simple merge will do it:

merge(a, b, by=c("x","y"))

By default, merge keeps only rows from both datasets where there is a match; you can change this behavior with the all, all.x, and all.y args, but here you don't need to.

If you don't want the z values from datatable b, do this:

merge(a, b[,.(x,y)], by=c("x","y"))

Upvotes: 3

Related Questions