Prradep
Prradep

Reputation: 5716

Is there an R function to merge two data frames based on two columns separately matching to the same column?

I would like to two populate values ("VAL") based on one of two columns separately("VALA","VALB").

# Data 
DF1 <- data.frame("colA" = rep(c("A","B"), 6),
                  "colB" = rep(c("C","D","E"), 4))

DF2 <- data.frame("colC" = c("A","B","C","D","E"),
                  "VAL" = 1:5)

# three join calls     
tmp1 <- left_join(DF1, DF2, by=c("colA"="colC"))
names(tmp1)[3] <-  "VALA"

tmp2 <- left_join(DF1, DF2, by=c("colB"="colC"))
names(tmp2)[3] <-  "VALB"

left_join(tmp1, tmp2, by=c("colA", "colB"))

#    colA colB VALA VALB
# 1     A    C    1    3
# 2     A    C    1    3
# 3     B    D    2    4
# 4     B    D    2    4
# 5     A    E    1    5
# 6     A    E    1    5
# 7     B    C    2    3
# 8     B    C    2    3
# 9     A    D    1    4
# 10    A    D    1    4
# 11    B    E    2    5
# 12    B    E    2    5
# 13    A    C    1    3
# 14    A    C    1    3
# 15    B    D    2    4
# 16    B    D    2    4
# 17    A    E    1    5
# 18    A    E    1    5
# 19    B    C    2    3
# 20    B    C    2    3
# 21    A    D    1    4
# 22    A    D    1    4
# 23    B    E    2    5
# 24    B    E    2    5

Why does the last operation give 24 rows as output instead of expected 12?

Is there any possibility to achieve the same expected out in the most elegant way(instead of 3 join operations)?

Upvotes: 2

Views: 57

Answers (2)

hoangthongtran412
hoangthongtran412

Reputation: 61

Try to combine left_join after one another using %>% and define its suffixes.

DF1 <- DF1 %>%
  left_join(DF2, c("colA" = "colC")) %>%
  left_join(DF2, c("colB" = "colC"), 
            suffix = c ("A", "B"))

> DF1
   colA colB VALA VALB
1     A    C    1    3
2     B    D    2    4
3     A    E    1    5
4     B    C    2    3
5     A    D    1    4
6     B    E    2    5
7     A    C    1    3
8     B    D    2    4
9     A    E    1    5
10    B    C    2    3
11    A    D    1    4
12    B    E    2    5

Upvotes: 1

GKi
GKi

Reputation: 39717

You can use match to find the corresponding value and cbind the resluting columns.

cbind(DF1, VALA=DF2$VAL[match(DF1$colA, DF2$colC)],
  VALB=DF2$VAL[match(DF1$colB, DF2$colC)])
   colA colB VALA VALB
#1     A    C    1    3
#2     B    D    2    4
#3     A    E    1    5
#4     B    C    2    3
#5     A    D    1    4
#6     B    E    2    5
#7     A    C    1    3
#8     B    D    2    4
#9     A    E    1    5
#10    B    C    2    3
#11    A    D    1    4
#12    B    E    2    5

or use names:

x <- setNames(DF2$VAL, DF2$colC)
cbind(DF1, VALA=x[DF1$colA], VALB=x[DF1$colB])

and in case for many columns using match inside lapply

cbind(DF1, setNames(lapply(DF1, function(x) DF2$VAL[match(x, DF2$colC)]),
 sub("col", "VAL", names(DF1))))
#   colA colB VALA VALB
#1     A    C    1    3
#2     B    D    2    4
#3     A    E    1    5
#4     B    C    2    3
#5     A    D    1    4
#6     B    E    2    5
#7     A    C    1    3
#8     B    D    2    4
#9     A    E    1    5
#10    B    C    2    3
#11    A    D    1    4
#12    B    E    2    5

Upvotes: 1

Related Questions