Merging data frames based on several conditions

Question

I have 2 dataframes.

df1 <- data.frame(
oID = c(111,112,113,114,115,116,117,118,119,120),
x1 = c(1,2,3,4,5,6,7,8,9,10),
x2 = c(1,2,3,4,5,6,7,8,9,10),
y1 = c(10,9,8,7,6,5,4,3,2,1),
y2 = c(10,9,8,7,6,5,4,3,2,1)
)

df2 <- data.frame(
oID = (115,116,117,118,119,120,121,122,123),
sID = c(105,106,107,108,109,110,111,112,113),
x1 = c(1,2,3,4,5,6,7,8,9),
x2 = c(1,2,2,2,2,2,2,2,2)
)

I want to add specific cases from df2 to df1. I only want to add cases, if the sID of the case matches with any of the oID in df1.

When adding a case from df2 to df1 I want to make some additional operations:

I want the case to keep some variables (oID, x1,x2,..).
I want to add some variables to the case (y1,y2,...). The levels of the added variables should be copied from the case with the matching oID in df1.

Example: Look at the case oID 123 from df2. It's sID is 113 which matches with a case in df1. I want to create a new case in df1 with the following characteristics: oID = 123; x1 = 9; x2 = 2; y1 = 8; y2 = 8

Ben · Accepted Answer

Here is a tidyverse approach. First, add rows from df1 to a semi_join of df1 and df2. This will add the x values on matched rows between sID and oID. Then, all that is missing are the y values. If there's an sID value (and not missing or NA), then it will match the oID and use that y value.

library(tidyverse)

bind_rows(
  df1,
  semi_join(
    df2,
    df1,
    by = c("sID" = "oID")
  )
) %>%
  mutate(across(y1:y2, ~ifelse(!is.na(sID), .[match(sID, oID)], .))) %>%
  select(-sID)

An alternative is to combine rows after doing a series of two consecutive joins:

bind_rows(
  df1,
  semi_join(
    df2,
    df1,
    by = c("sID" = "oID")
  ) %>%
    left_join(
      df1[, c("oID", "y1", "y2")],
      by = c("sID" = "oID")
    ) %>%
      select(-sID)
)

Output

   oID x1 x2 y1 y2
1  111  1  1 10 10
2  112  2  2  9  9
3  113  3  3  8  8
4  114  4  4  7  7
5  115  5  5  6  6
6  116  6  6  5  5
7  117  7  7  4  4
8  118  8  8  3  3
9  119  9  9  2  2
10 120 10 10  1  1
11 121  7  2 10 10
12 122  8  2  9  9
13 123  9  2  8  8

Merging data frames based on several conditions

Answers (2)

Related Questions