Chuck
Chuck

Reputation: 3852

R Merge Two Dataframes on columns keep columns

Apologies, if this is a duplicate please let me know, I'll gladly delete.

I am merging two datasets in R using merge.

age1 = c(5, 6, 7, 8, 10, 11) 
fname1 = c('david','alan','ben', 'ben', 'richard', 'edd') 
sname1 = c('albert','raymond','albert','pete','raymond', 'alan')
area1 = c('r','t','n','x','z','w')

df1 <- data.frame(age1, fname1, sname1, area1)

age2 = c(5, 9, 10, 3, 4, 0) 
fname2 = c('david','alan','david', 'ben', 'richard', 'edd') 
sname2 = c('albert','edd','albert','pete','raymond', 'alan')
area2 = c('w','z','x','n','t','r')

df2 = data.frame(age2, fname2, sname2, area2)

Dataset1:

df1
  age1  fname1  sname1 area1
1    5   david  albert     r
2    6    alan raymond     t
3    7     ben  albert     n
4    8     ben    pete     x
5   10 richard raymond     z
6   11     edd    alan     w

Dataset 2

df2
  age2  fname2  sname2 area2
1    5   david  albert     w
2    9    alan     edd     z
3   10   david  albert     x
4    3     ben    pete     n
5    4 richard raymond     t
6    0     edd    alan     r

I merge on fname and sname with:

matchkey <- merge(df1, df2, by.x = c("fname1", "sname1"), by.y = c("fname2", "sname2"))
View(matchkey)

Output:

> matchkey
   fname1  sname1 age1 area1 age2 area2
1     ben    pete    8     x    3     n
2   david  albert    5     r    5     w
3   david  albert    5     r   10     x
4     edd    alan   11     w    0     r
5 richard raymond   10     z    4     t

However, I want to keep the columns I have merged on. How can I do this? Should I use something other than merge?

Expected Output:

   fname1  sname1 age1 area1  fname2   sname 2age2  area2
1     ben    pete    8     x  ben    pete      3       n
2   david  albert    5     r  david  albert    5       w
3   david  albert    5     r  david  albert    10      x
4     edd    alan   11     w  edd    alan      0       r
5 richard raymond   10     z  richard raymond  4       t

I tried looking at, but to no success:

How do I combine two data-frames based on two columns?

Combining two dataframes keeping all columns

Merge two dataframes with repeated columns

Many thanks.

Upvotes: 1

Views: 3808

Answers (1)

Parfait
Parfait

Reputation: 107567

Since merged columns would be exact same on an inner join or complete match between dataframes, simply assign new columns to the columns that remained. And you can do so with transform(). Below adds an outer() combination with paste0 to retrieve needed column order:

matchkey <- transform(merge(df1, df2, by.x = c("fname1", "sname1"), 
                                      by.y = c("fname2", "sname2")),
                      fname2 = fname1, sname2 = sname1)

ordercols <- c(outer(c("fname", "sname", "age", "area"), c(1:2), paste0))
matchkey <- matchkey[ordercols]

matchkey    
#    fname1  sname1 age1 area1  fname2  sname2 age2 area2
# 1     ben    pete    8     x     ben    pete    3     n
# 2   david  albert    5     r   david  albert    5     w
# 3   david  albert    5     r   david  albert   10     x
# 4     edd    alan   11     w     edd    alan    0     r
# 5 richard raymond   10     z richard raymond    4     t

Upvotes: 2

Related Questions