littleblue
littleblue

Reputation: 13

Merge a single column in two data frames in R where only some rows match

I'm trying to merge data from a column in one data frame into another column in a second data frame, using a column with shared data as the key for the operation. I've tried a couple of different tutorials on how to replicate Excel vlookup in r, and I've tried several different methods of merger provided here in answers on stackoverflow...but so far nothing I've tried does exactly what I'm trying to do, even though it seems like it would be fairly common function.

So here's what I'm trying to do...to provide a very simplistic example, let's say this is dataframe1:

ID      RESULT

4       YES
1       YES
3       NO
2       YES

And below is dataframe2, where the RESULT column is entirely populated by NAs, some of which will be replaced by what appears in dataframe1$RESULT (there are many more values in dataframe2$ID than in dataframe1$ID, so many NAs should remain after the merge):

ID      RESULT

1       NA
8       NA
2       NA
9       NA
3       NA
6       NA
7       NA
4       NA

So the end result that I want is below, dataframe3, where RESULT contains YES or NO, if one of those values were present in dataframe1$RESULT, or NA where there was no match between dataframe1$ID and dataframe2$ID.

ID      RESULT

1       YES
8       NA
2       YES
9       NA
3       NO
6       NA
7       NA
4       YES

I've managed to do a merge with the following that adds a column (RESULT.y) to dataframe1, but it also removes all of the rows from dataframe2 that don't have a match in dataframe1:

dataframe3 <- merge(dataframe2, dataframe1, by=c("ID"))

What I end up with is:

    ID      RESULT.x  RESULT.y

    4       NA        YES
    1       NA        YES
    3       NA        NO
    2       NA        YES

If it was JUST that it added a new column instead of modifying the existing one, I could work around that, but I can't figure out how to get it to leave the other rows alone rather than eliminating them. I tried the following, but it returns an error:

dataframe3 <- merge(dataframe2, dataframe1, by=c("ID", all.x = TRUE))

Error in fix.by(by.x, x) : 'by' must specify a uniquely valid column

Help will be very, very much appreciated...I've only been working in r for a couple weeks.

Upvotes: 1

Views: 5045

Answers (2)

Math
Math

Reputation: 1294

Here is you example :

ID=c(4,1,3,2)
RESULT=c("YES","YES","NO","YES")
data1=data.frame(ID,RESULT)

ID=c(1,8,2,9,3,6,7,4)
RESULT=c("NA","NA","NA","NA","NA","NA","NA","NA")
data2=data.frame(ID,RESULT)

you can just add : all=TRUE

dataframe3 <- merge(data1, data2, by=c("ID"),all=TRUE)

Upvotes: 1

C_Z_
C_Z_

Reputation: 7816

You could try rbind and select only the rows of df2 that are not in df1

rbind(dataframe1, dataframe2[!dataframe2$ID %in% dataframe1$ID,])


    ID RESULT
1   4    YES
2   1    YES
3   3     NO
4   2    YES
21  8   <NA>
41  9   <NA>
6   6   <NA>
7   7   <NA>

Upvotes: 1

Related Questions