Reputation: 110
I have a list of dataframes with similar variable names that I'm looking to join using full_join
in the order in which they appear in the list.
require(tidyverse)
x<-data.frame(id=c("a","a","b","b","b","c","c","c","c"),
sub.id=c("1","2","1","2","3","1","2","3","4"))
y<-data.frame(id = as.character(rep(1:4,each=2)),
sub.id = c("AA","CC","DD","AA","GG","OO","PP","OW"))
z<-data.frame(id = c("AA","CC","DD","GG","OO","OW","PP"),
sub.id = as.character(1:7))
dfs<-list(x,y,z)
I've tried using reduce
from the purrr
package but this will join all dataframes in the list to the first dataframe. In this case the x
dataframe.
dfs %>%
reduce(full_join,by = c("sub.id" = "id"))
Is there a way to perform a full_join
to the dataframes found in a list such that the by
follows the sequence that the dataframes appear on the list? In this example the sub.id
of x
would match with id
of y
and then the sub.id
from y
after joining would match the id
of z
for the final join.
EDIT: The expected result of this should be similar to the following:
id sub.id.x sub.id.y sub.id.y.y
1 a 1 AA 1
2 a 1 CC 2
3 a 2 DD 3
4 a 2 AA 1
5 b 1 AA 1
6 b 1 CC 2
7 b 2 DD 3
8 b 2 AA 1
9 b 3 GG 4
10 b 3 OO 5
11 c 1 AA 1
12 c 1 CC 2
13 c 2 DD 3
14 c 2 AA 1
15 c 3 GG 4
16 c 3 OO 5
17 c 4 PP 7
18 c 4 OW 6
Joinded column name suffixes unchanged at this time.
Upvotes: 1
Views: 1681
Reputation: 79258
If we can assume that the joining columns are always found on the end of the first dataframe and the first on the second dataframe, then you could do:
In Base R:
Reduce(function(x,y) merge(x,y,by.x = tail(names(x),1), by.y = names(y)[1], all = TRUE), dfs)
sub.id1 sub.id0 id sub.id11
1 AA 1 a 1
2 AA 2 a 1
3 AA 1 c 1
4 AA 2 b 1
5 AA 1 b 1
6 AA 2 c 1
7 CC 1 a 2
8 CC 1 b 2
9 CC 1 c 2
10 DD 2 b 3
11 DD 2 a 3
12 DD 2 c 3
13 GG 3 b 4
14 GG 3 c 4
15 OO 3 b 5
16 OO 3 c 5
17 OW 4 c 6
18 PP 4 c 7
Upvotes: 2
Reputation: 887251
Perhaps, we need a for
loop to change the column names after each join on the output generated
out <- dfs[[1]]
for(i in 2:length(dfs)) {
out <- full_join(out, dfs[[i]], by = c('sub.id' = 'id'))
names(out)[names(out) == 'sub.id'] <- paste0("sub.id", i)
names(out)[names(out) == 'sub.id.y'] <- 'sub.id'
}
-output
out
# id sub.id2 sub.id3 sub.id
#1 a 1 AA 1
#2 a 1 CC 2
#3 a 2 DD 3
#4 a 2 AA 1
#5 b 1 AA 1
#6 b 1 CC 2
#7 b 2 DD 3
#8 b 2 AA 1
#9 b 3 GG 4
#10 b 3 OO 5
#11 c 1 AA 1
#12 c 1 CC 2
#13 c 2 DD 3
#14 c 2 AA 1
#15 c 3 GG 4
#16 c 3 OO 5
#17 c 4 PP 7
#18 c 4 OW 6
Upvotes: 3