Scasto
Scasto

Reputation: 32

Merging data frames based on several conditions

I have 2 dataframes.

df1 <- data.frame(
oID = c(111,112,113,114,115,116,117,118,119,120),
x1 = c(1,2,3,4,5,6,7,8,9,10),
x2 = c(1,2,3,4,5,6,7,8,9,10),
y1 = c(10,9,8,7,6,5,4,3,2,1),
y2 = c(10,9,8,7,6,5,4,3,2,1)
)

df2 <- data.frame(
oID = (115,116,117,118,119,120,121,122,123),
sID = c(105,106,107,108,109,110,111,112,113),
x1 = c(1,2,3,4,5,6,7,8,9),
x2 = c(1,2,2,2,2,2,2,2,2)
)

I want to add specific cases from df2 to df1. I only want to add cases, if the sID of the case matches with any of the oID in df1.

When adding a case from df2 to df1 I want to make some additional operations:

Example: Look at the case oID 123 from df2. It's sID is 113 which matches with a case in df1. I want to create a new case in df1 with the following characteristics: oID = 123; x1 = 9; x2 = 2; y1 = 8; y2 = 8

Upvotes: 0

Views: 51

Answers (2)

Ben
Ben

Reputation: 30474

Here is a tidyverse approach. First, add rows from df1 to a semi_join of df1 and df2. This will add the x values on matched rows between sID and oID. Then, all that is missing are the y values. If there's an sID value (and not missing or NA), then it will match the oID and use that y value.

library(tidyverse)

bind_rows(
  df1,
  semi_join(
    df2,
    df1,
    by = c("sID" = "oID")
  )
) %>%
  mutate(across(y1:y2, ~ifelse(!is.na(sID), .[match(sID, oID)], .))) %>%
  select(-sID)

An alternative is to combine rows after doing a series of two consecutive joins:

bind_rows(
  df1,
  semi_join(
    df2,
    df1,
    by = c("sID" = "oID")
  ) %>%
    left_join(
      df1[, c("oID", "y1", "y2")],
      by = c("sID" = "oID")
    ) %>%
      select(-sID)
)

Output

   oID x1 x2 y1 y2
1  111  1  1 10 10
2  112  2  2  9  9
3  113  3  3  8  8
4  114  4  4  7  7
5  115  5  5  6  6
6  116  6  6  5  5
7  117  7  7  4  4
8  118  8  8  3  3
9  119  9  9  2  2
10 120 10 10  1  1
11 121  7  2 10 10
12 122  8  2  9  9
13 123  9  2  8  8

Upvotes: 1

Merijn van Tilborg
Merijn van Tilborg

Reputation: 5887

If I understood your question correctly you can merge the tables and then compare some of the values and replace them. Thereafter you just clean up some columns again.

library(data.table)

setDT(df1)
setDT(df2)

merged <- merge(df1, df2, by.x = "oID", by.y = "sID", all.x = T)
merged[!is.na(oID.y), oID := oID.y][!is.na(x1.y), x1.x := x1.y][!is.na(x2.y), x2.x := x2.y]
merged <- merged[, .(oID, x1.x, x2.x, y1, y2)]
setnames(merged, names(df1))

merged

    oID x1 x2 y1 y2
 1: 121  7  2 10 10
 2: 122  8  2  9  9
 3: 123  9  2  8  8
 4: 114  4  4  7  7
 5: 115  5  5  6  6
 6: 116  6  6  5  5
 7: 117  7  7  4  4
 8: 118  8  8  3  3
 9: 119  9  9  2  2
10: 120 10 10  1  1

data

df1 <- data.frame(
  oID = c(111,112,113,114,115,116,117,118,119,120),
  x1 = c(1,2,3,4,5,6,7,8,9,10),
  x2 = c(1,2,3,4,5,6,7,8,9,10),
  y1 = c(10,9,8,7,6,5,4,3,2,1),
  y2 = c(10,9,8,7,6,5,4,3,2,1)
)

df2 <- data.frame(
  oID = c(115,116,117,118,119,120,121,122,123),
  sID = c(105,106,107,108,109,110,111,112,113),
  x1 = c(1,2,3,4,5,6,7,8,9),
  x2 = c(1,2,2,2,2,2,2,2,2)
)

Upvotes: 0

Related Questions