Creating new dataset by finding match in 2 other datasets in R

Question

I've tried to be clear but apologies if it's not clear enough...I have two datasets of different length. In dataset 1, for col1 = x, col2 = y

dataset 1 =

col1  col2
0.1   0.21
0.1   0.22
0.2   0.4
0.34  0.3
0.4   0.36

Equally, in dataset 2, for col3 = x, col4 = y

col3  col4
0.1   0.2
0.2   0.3
0.3   0.4
0.4   0.5
0.5   0.6
0.6   0.7
0.7   0.8

Some values in col1 and col3 match so I would like to create a new dataset where all the values that are found in col1 are written (all the values found in col3 but not col1 are ignored) but with their "matching" number in col4 such that

new_col1 new_col4
0.1      0.2
0.1      0.2
0.2      0.3
0.34     NA
0.3      0.4

Could you please explain how I can do this? Thank you.

James · Accepted Answer

Use merge:

dat1 <- data.frame(col1=c(0.1,0.1,0.2,0.34,0.4),col2=c(0.21,0.22,0.4,0.3,0.36))
dat2 <- data.frame(col3=(1:7)/10,col4=(2:8)/10)

> merge(dat1, dat2, by.x="col1", by.y="col3", all.x=TRUE)[-2]
  col1 col4
1 0.10  0.2
2 0.10  0.2
3 0.20  0.3
4 0.34   NA
5 0.40  0.5

The by parameters specify which columns to match on, all.x=TRUE tells it to use all of the first dataframe, and the final subsetting [-2] is to remove col2 from the output.

Edit:

For your extra requirement, use dat2$col3 <- trunc(dat2$col3*10)/10 before the merge.

Creating new dataset by finding match in 2 other datasets in R

Answers (1)

Related Questions