Reputation: 32
I have 2 dataframes.
df1 <- data.frame(
oID = c(111,112,113,114,115,116,117,118,119,120),
x1 = c(1,2,3,4,5,6,7,8,9,10),
x2 = c(1,2,3,4,5,6,7,8,9,10),
y1 = c(10,9,8,7,6,5,4,3,2,1),
y2 = c(10,9,8,7,6,5,4,3,2,1)
)
df2 <- data.frame(
oID = (115,116,117,118,119,120,121,122,123),
sID = c(105,106,107,108,109,110,111,112,113),
x1 = c(1,2,3,4,5,6,7,8,9),
x2 = c(1,2,2,2,2,2,2,2,2)
)
I want to add specific cases from df2 to df1. I only want to add cases, if the sID of the case matches with any of the oID in df1.
When adding a case from df2 to df1 I want to make some additional operations:
Example: Look at the case oID 123 from df2. It's sID is 113 which matches with a case in df1. I want to create a new case in df1 with the following characteristics: oID = 123; x1 = 9; x2 = 2; y1 = 8; y2 = 8
Upvotes: 0
Views: 51
Reputation: 30474
Here is a tidyverse
approach. First, add rows from df1
to a semi_join
of df1
and df2
. This will add the x values on matched rows between sID
and oID
. Then, all that is missing are the y values. If there's an sID
value (and not missing or NA
), then it will match
the oID
and use that y value.
library(tidyverse)
bind_rows(
df1,
semi_join(
df2,
df1,
by = c("sID" = "oID")
)
) %>%
mutate(across(y1:y2, ~ifelse(!is.na(sID), .[match(sID, oID)], .))) %>%
select(-sID)
An alternative is to combine rows after doing a series of two consecutive joins:
bind_rows(
df1,
semi_join(
df2,
df1,
by = c("sID" = "oID")
) %>%
left_join(
df1[, c("oID", "y1", "y2")],
by = c("sID" = "oID")
) %>%
select(-sID)
)
Output
oID x1 x2 y1 y2
1 111 1 1 10 10
2 112 2 2 9 9
3 113 3 3 8 8
4 114 4 4 7 7
5 115 5 5 6 6
6 116 6 6 5 5
7 117 7 7 4 4
8 118 8 8 3 3
9 119 9 9 2 2
10 120 10 10 1 1
11 121 7 2 10 10
12 122 8 2 9 9
13 123 9 2 8 8
Upvotes: 1
Reputation: 5887
If I understood your question correctly you can merge the tables and then compare some of the values and replace them. Thereafter you just clean up some columns again.
library(data.table)
setDT(df1)
setDT(df2)
merged <- merge(df1, df2, by.x = "oID", by.y = "sID", all.x = T)
merged[!is.na(oID.y), oID := oID.y][!is.na(x1.y), x1.x := x1.y][!is.na(x2.y), x2.x := x2.y]
merged <- merged[, .(oID, x1.x, x2.x, y1, y2)]
setnames(merged, names(df1))
merged
oID x1 x2 y1 y2
1: 121 7 2 10 10
2: 122 8 2 9 9
3: 123 9 2 8 8
4: 114 4 4 7 7
5: 115 5 5 6 6
6: 116 6 6 5 5
7: 117 7 7 4 4
8: 118 8 8 3 3
9: 119 9 9 2 2
10: 120 10 10 1 1
data
df1 <- data.frame(
oID = c(111,112,113,114,115,116,117,118,119,120),
x1 = c(1,2,3,4,5,6,7,8,9,10),
x2 = c(1,2,3,4,5,6,7,8,9,10),
y1 = c(10,9,8,7,6,5,4,3,2,1),
y2 = c(10,9,8,7,6,5,4,3,2,1)
)
df2 <- data.frame(
oID = c(115,116,117,118,119,120,121,122,123),
sID = c(105,106,107,108,109,110,111,112,113),
x1 = c(1,2,3,4,5,6,7,8,9),
x2 = c(1,2,2,2,2,2,2,2,2)
)
Upvotes: 0