Reputation: 27
I have 2 data sets df1 and df2.
df1
c1 match c3 c4
AA1 AB cat dog
AA1 CD dfs abd
AA1 EF js hn
AA1 GH bsk jtd
AA2 AB cat mouse
AA2 CD adb mop
AA2 EF powas qwert
AA2 GH sms mms
AA3 AB i j
AA3 CD fgh ejk
AA3 EF mib loi
AA3 GH revit roger
df2
match d2 result
AB cat friendly
AB mouse enemy
CD dfs r1
CD adb r1
CD fgh r2
CD ejk r3
EF mib some_result
GH sms sent
GH mms sent
IJ xxx yyy
KL crt zzz
KL rrr qqq
I want to match match df1 and df2 by column "match" and add 2 new columns "result_c1" and "result_c2" in df1. result_c1 gets the corresponding result from df2 by first matching the match column and then matching c3 from df1 to d2 from df2. result_c2 gets the corresponding result from df2 by first matching the match column and then matching c4 from df1 to d2 from df2. If there is no match, it returns "no_match". Is there an efficient way to do this?
result
c1 match c3 c4 result_c1 result_c2
AA1 AB cat dog friendly no_match
AA1 CD dfs adb r1 r1
AA1 EF js hn no_match no_match
AA1 GH bsk jtd no_match no_match
AA2 AB cat mouse friendly enemy
AA2 CD adb mop r1 no_match
AA2 EF powas qwert no_match no_match
AA2 GH sms mms sent sent
AA3 AB i j no_match no_match
AA3 CD fgh ejk r2 r3
AA3 EF mib loi some_result no_match
AA3 GH revit roger no_match no_match
The data is attached below:
df1 <- data.frame(list(c1 = c("AA1", "AA1", "AA1", "AA1", "AA2", "AA2", "AA2", "AA2",
"AA3", "AA3", "AA3", "AA3"), match = c("AB", "CD", "EF", "GH",
"AB", "CD", "EF", "GH",
"AB", "CD", "EF", "GH"),
c3 = c("cat", "dfs", "js", "bsk", "cat", "adb", "powas", "sms", "i",
"fgh", "mib", "revit"), c4 = c("dog", "abd", "hn", "jtd", "mouse",
"mop", "qwert", "mms", "j", "ejk", "loi", "roger")))
df2 <- data.frame(list(match = c("AB", "AB", "CD", "CD", "CD", "CD", "EF", "GH", "GH", "IJ", "KL", "KL"),
d2 = c("cat", "mouse", "dfs", "adb", "fgh", "ejk", "mib", "sms", "mms", "xxx", "crt", "rrr"),
result = c("friendly", "enemy", "r1", "r1", "r2", "r3", "some_result", "sent", "sent", "yyy", "zzz", "qqq")))
Thank you.
Upvotes: 1
Views: 153
Reputation: 101034
Here is a solution with base R
:
df1$result_c1 = with(df1,ifelse(is.na(match(paste(match,c3),with(df2,paste(match,d2)))),
"no match",
as.character(df2$result[match(paste(match,c3),with(df2,paste(match,d2)))])))
df1$result_c2 = with(df1,ifelse(is.na(match(paste(match,c4),with(df2,paste(match,d2)))),
"no match",
as.character(df2$result[match(paste(match,c4),with(df2,paste(match,d2)))])))
such that
> df1
c1 match c3 c4 result_c1 result_c2
1 AA1 AB cat dog friendly no match
2 AA1 CD dfs abd r1 r1
3 AA1 EF js hn no match no match
4 AA1 GH bsk jtd no match no match
5 AA2 AB cat mouse friendly enemy
6 AA2 CD adb mop no match no match
7 AA2 EF powas qwert no match no match
8 AA2 GH sms mms sent sent
9 AA3 AB i j no match no match
10 AA3 CD fgh ejk r2 r3
11 AA3 EF mib loi some_result no match
12 AA3 GH revit roger no match no match
Upvotes: 1
Reputation: 388797
One way with dplyr
using a custom function
apply_fun <- function(x, y, r) {
inds <- x %in% y
if (any(inds)) r[match(x[which.max(inds)], y)] else "no_match"
}
library(dplyr)
df1 %>%
left_join(df2, by = "match") %>%
mutate_all(as.character) %>%
group_by(c1, match) %>%
summarise(result_c1 = apply_fun(c3, d2, result),
result_c2 = apply_fun(c4, d2, result))
# c1 match result_c1 result_c2
# <chr> <chr> <chr> <chr>
# 1 AA1 AB friendly no_match
# 2 AA1 CD r1 no_match
# 3 AA1 EF no_match no_match
# 4 AA1 GH no_match no_match
# 5 AA2 AB friendly enemy
# 6 AA2 CD r1 no_match
# 7 AA2 EF no_match no_match
# 8 AA2 GH sent sent
# 9 AA3 AB no_match no_match
#10 AA3 CD r2 r3
#11 AA3 EF some_result no_match
#12 AA3 GH no_match no_match
Upvotes: 1