Reputation: 3852
Apologies, if this is a duplicate please let me know, I'll gladly delete.
I am merging two datasets in R using merge
.
age1 = c(5, 6, 7, 8, 10, 11)
fname1 = c('david','alan','ben', 'ben', 'richard', 'edd')
sname1 = c('albert','raymond','albert','pete','raymond', 'alan')
area1 = c('r','t','n','x','z','w')
df1 <- data.frame(age1, fname1, sname1, area1)
age2 = c(5, 9, 10, 3, 4, 0)
fname2 = c('david','alan','david', 'ben', 'richard', 'edd')
sname2 = c('albert','edd','albert','pete','raymond', 'alan')
area2 = c('w','z','x','n','t','r')
df2 = data.frame(age2, fname2, sname2, area2)
Dataset1:
df1
age1 fname1 sname1 area1
1 5 david albert r
2 6 alan raymond t
3 7 ben albert n
4 8 ben pete x
5 10 richard raymond z
6 11 edd alan w
Dataset 2
df2
age2 fname2 sname2 area2
1 5 david albert w
2 9 alan edd z
3 10 david albert x
4 3 ben pete n
5 4 richard raymond t
6 0 edd alan r
I merge on fname
and sname
with:
matchkey <- merge(df1, df2, by.x = c("fname1", "sname1"), by.y = c("fname2", "sname2"))
View(matchkey)
Output:
> matchkey
fname1 sname1 age1 area1 age2 area2
1 ben pete 8 x 3 n
2 david albert 5 r 5 w
3 david albert 5 r 10 x
4 edd alan 11 w 0 r
5 richard raymond 10 z 4 t
However, I want to keep the columns I have merged on. How can I do this? Should I use something other than merge?
Expected Output:
fname1 sname1 age1 area1 fname2 sname 2age2 area2
1 ben pete 8 x ben pete 3 n
2 david albert 5 r david albert 5 w
3 david albert 5 r david albert 10 x
4 edd alan 11 w edd alan 0 r
5 richard raymond 10 z richard raymond 4 t
I tried looking at, but to no success:
How do I combine two data-frames based on two columns?
Combining two dataframes keeping all columns
Merge two dataframes with repeated columns
Many thanks.
Upvotes: 1
Views: 3808
Reputation: 107567
Since merged columns would be exact same on an inner join or complete match between dataframes, simply assign new columns to the columns that remained. And you can do so with transform()
. Below adds an outer()
combination with paste0
to retrieve needed column order:
matchkey <- transform(merge(df1, df2, by.x = c("fname1", "sname1"),
by.y = c("fname2", "sname2")),
fname2 = fname1, sname2 = sname1)
ordercols <- c(outer(c("fname", "sname", "age", "area"), c(1:2), paste0))
matchkey <- matchkey[ordercols]
matchkey
# fname1 sname1 age1 area1 fname2 sname2 age2 area2
# 1 ben pete 8 x ben pete 3 n
# 2 david albert 5 r david albert 5 w
# 3 david albert 5 r david albert 10 x
# 4 edd alan 11 w edd alan 0 r
# 5 richard raymond 10 z richard raymond 4 t
Upvotes: 2