Join list of dataframes in sequence using full_join in R

Question

I have a list of dataframes with similar variable names that I'm looking to join using full_join in the order in which they appear in the list.

require(tidyverse)

x<-data.frame(id=c("a","a","b","b","b","c","c","c","c"),
              sub.id=c("1","2","1","2","3","1","2","3","4"))

y<-data.frame(id = as.character(rep(1:4,each=2)),
              sub.id = c("AA","CC","DD","AA","GG","OO","PP","OW"))

z<-data.frame(id = c("AA","CC","DD","GG","OO","OW","PP"),
              sub.id = as.character(1:7))

dfs<-list(x,y,z)

I've tried using reduce from the purrr package but this will join all dataframes in the list to the first dataframe. In this case the x dataframe.

dfs %>% 
  reduce(full_join,by = c("sub.id" = "id"))

Is there a way to perform a full_join to the dataframes found in a list such that the by follows the sequence that the dataframes appear on the list? In this example the sub.id of x would match with id of y and then the sub.id from y after joining would match the id of z for the final join.

EDIT: The expected result of this should be similar to the following:

    id sub.id.x sub.id.y sub.id.y.y
1   a        1       AA          1
2   a        1       CC          2
3   a        2       DD          3
4   a        2       AA          1
5   b        1       AA          1
6   b        1       CC          2
7   b        2       DD          3
8   b        2       AA          1
9   b        3       GG          4
10  b        3       OO          5
11  c        1       AA          1
12  c        1       CC          2
13  c        2       DD          3
14  c        2       AA          1
15  c        3       GG          4
16  c        3       OO          5
17  c        4       PP          7
18  c        4       OW          6

Joinded column name suffixes unchanged at this time.

akrun · Accepted Answer

Perhaps, we need a for loop to change the column names after each join on the output generated

out <- dfs[[1]]
for(i in 2:length(dfs)) {
    out <- full_join(out, dfs[[i]], by = c('sub.id' = 'id'))
    names(out)[names(out) == 'sub.id'] <- paste0("sub.id", i)
    names(out)[names(out) == 'sub.id.y'] <- 'sub.id'
  }

-output

out
#   id sub.id2 sub.id3 sub.id
#1   a       1      AA      1
#2   a       1      CC      2
#3   a       2      DD      3
#4   a       2      AA      1
#5   b       1      AA      1
#6   b       1      CC      2
#7   b       2      DD      3
#8   b       2      AA      1
#9   b       3      GG      4
#10  b       3      OO      5
#11  c       1      AA      1
#12  c       1      CC      2
#13  c       2      DD      3
#14  c       2      AA      1
#15  c       3      GG      4
#16  c       3      OO      5
#17  c       4      PP      7
#18  c       4      OW      6

Join list of dataframes in sequence using full_join in R

Answers (2)

Related Questions