Victor
Victor

Reputation: 27

Compare each row in a dataframe with multiple rows from another dataframe and get result

I have 2 data sets df1 and df2.

df1
c1  match   c3      c4
AA1 AB      cat     dog
AA1 CD      dfs     abd
AA1 EF      js      hn
AA1 GH      bsk     jtd
AA2 AB      cat     mouse
AA2 CD      adb     mop
AA2 EF      powas   qwert
AA2 GH      sms     mms
AA3 AB      i       j
AA3 CD      fgh     ejk
AA3 EF      mib     loi
AA3 GH      revit   roger

df2
match   d2      result
AB      cat     friendly
AB      mouse   enemy
CD      dfs     r1
CD      adb     r1
CD      fgh     r2
CD      ejk     r3
EF      mib     some_result
GH      sms     sent
GH      mms     sent
IJ      xxx     yyy
KL      crt     zzz
KL      rrr     qqq

I want to match match df1 and df2 by column "match" and add 2 new columns "result_c1" and "result_c2" in df1. result_c1 gets the corresponding result from df2 by first matching the match column and then matching c3 from df1 to d2 from df2. result_c2 gets the corresponding result from df2 by first matching the match column and then matching c4 from df1 to d2 from df2. If there is no match, it returns "no_match". Is there an efficient way to do this?

result
c1  match   c3      c4      result_c1   result_c2   
AA1 AB      cat     dog     friendly    no_match    
AA1 CD      dfs     adb     r1          r1          
AA1 EF      js      hn      no_match    no_match    
AA1 GH      bsk     jtd     no_match    no_match    
AA2 AB      cat     mouse   friendly    enemy       
AA2 CD      adb     mop     r1          no_match    
AA2 EF      powas   qwert   no_match    no_match    
AA2 GH      sms     mms     sent        sent        
AA3 AB      i       j       no_match    no_match    
AA3 CD      fgh     ejk     r2          r3          
AA3 EF      mib     loi     some_result no_match    
AA3 GH      revit   roger   no_match    no_match    

The data is attached below:

df1 <- data.frame(list(c1 = c("AA1", "AA1", "AA1", "AA1", "AA2", "AA2", "AA2", "AA2",
                      "AA3", "AA3", "AA3", "AA3"), match = c("AB", "CD", "EF", "GH", 
                                                             "AB", "CD", "EF", "GH", 
                                                             "AB", "CD", "EF", "GH"),
                      c3 = c("cat", "dfs", "js", "bsk", "cat", "adb", "powas", "sms", "i",
                      "fgh", "mib", "revit"), c4 = c("dog", "abd", "hn", "jtd", "mouse",
                                                     "mop", "qwert", "mms", "j", "ejk", "loi", "roger")))

df2 <- data.frame(list(match = c("AB", "AB", "CD", "CD", "CD", "CD", "EF", "GH", "GH", "IJ", "KL", "KL"), 
                       d2 = c("cat", "mouse", "dfs", "adb", "fgh", "ejk", "mib", "sms", "mms", "xxx", "crt", "rrr"),
                       result = c("friendly", "enemy", "r1", "r1", "r2", "r3", "some_result", "sent", "sent", "yyy", "zzz", "qqq")))

Thank you.

Upvotes: 1

Views: 153

Answers (2)

ThomasIsCoding
ThomasIsCoding

Reputation: 101034

Here is a solution with base R:

df1$result_c1 = with(df1,ifelse(is.na(match(paste(match,c3),with(df2,paste(match,d2)))),
                                "no match",
                                as.character(df2$result[match(paste(match,c3),with(df2,paste(match,d2)))])))
df1$result_c2 = with(df1,ifelse(is.na(match(paste(match,c4),with(df2,paste(match,d2)))),
                                "no match",
                                as.character(df2$result[match(paste(match,c4),with(df2,paste(match,d2)))])))

such that

> df1
    c1 match    c3    c4   result_c1 result_c2
1  AA1    AB   cat   dog    friendly  no match
2  AA1    CD   dfs   abd          r1        r1
3  AA1    EF    js    hn    no match  no match
4  AA1    GH   bsk   jtd    no match  no match
5  AA2    AB   cat mouse    friendly     enemy
6  AA2    CD   adb   mop    no match  no match
7  AA2    EF powas qwert    no match  no match
8  AA2    GH   sms   mms        sent      sent
9  AA3    AB     i     j    no match  no match
10 AA3    CD   fgh   ejk          r2        r3
11 AA3    EF   mib   loi some_result  no match
12 AA3    GH revit roger    no match  no match

Upvotes: 1

Ronak Shah
Ronak Shah

Reputation: 388797

One way with dplyr using a custom function

apply_fun <- function(x, y, r) {
   inds <- x %in% y
   if (any(inds)) r[match(x[which.max(inds)], y)] else "no_match"
}

library(dplyr)
df1 %>%
  left_join(df2, by = "match") %>%
  mutate_all(as.character) %>%
  group_by(c1, match) %>%
  summarise(result_c1 = apply_fun(c3, d2, result), 
            result_c2 = apply_fun(c4, d2, result))

#   c1    match result_c1   result_c2
#   <chr> <chr> <chr>       <chr>    
# 1 AA1   AB    friendly    no_match 
# 2 AA1   CD    r1          no_match 
# 3 AA1   EF    no_match    no_match 
# 4 AA1   GH    no_match    no_match 
# 5 AA2   AB    friendly    enemy    
# 6 AA2   CD    r1          no_match 
# 7 AA2   EF    no_match    no_match 
# 8 AA2   GH    sent        sent     
# 9 AA3   AB    no_match    no_match 
#10 AA3   CD    r2          r3       
#11 AA3   EF    some_result no_match 
#12 AA3   GH    no_match    no_match 

Upvotes: 1

Related Questions