How to merge dataframes based on *partial* row overlap?

Question

I would like to merge dataframes in R such that only only those observations whose rows partially correspond across the dataframes are kept.

I have two dataframes (These are toy dataframes - the actual ones have hundreds of columns.):

    V1             V2      V3 
    rabbit         001     M
    squirrel       001     M
    cow            001     M
    rabbit         004     M
    squirrel       004     M
    skunk          004     M

    V1             V2       V3
    rabbit         001      B
    squirrel       001      B
    skunk          001      B
    rabbit         004      B
    squirrel       004      B
    skunk          008      B

Desired outcome:

    V1             V2       V3
    rabbit         001      M
    squirrel       001      M
    rabbit         004      M
    squirrel       004      M
    rabbit         001      B
    squirrel       001      B
    rabbit         004      B
    squirrel       004      B

merge and dplyr::inter_join aren't quite the right functions for this. What is?

divibisan · Accepted Answer

d.b's answer is likely much more efficient, but if you prefer to think about the problem in terms of JOIN operations, you can do this with 3 dplyr join operations:

library(dplyr)

# Perform an inner_join with just the columns that you want to match
match_rows <- inner_join(df1[,1:2], df2[,1:2])
match_rows

        V1 V2
1   rabbit  1
2 squirrel  1
3   rabbit  4
4 squirrel  4

# Then left_join that with each dataframe to get the matching rows from each
#  and then bind them together as rows
bind_rows(left_join(match_rows, df1),
          left_join(match_rows, df2))

        V1 V2 V3
1   rabbit  1  M
2 squirrel  1  M
3   rabbit  4  M
4 squirrel  4  M
5   rabbit  1  B
6 squirrel  1  B
7   rabbit  4  B
8 squirrel  4  B

How to merge dataframes based on partial row overlap?

Answers (2)

Data

Related Questions