Join list of dataframes in sequence using full_join in R

I have a list of dataframes with similar variable names that I'm looking to join using full_join in the order in which they appear in the list.

require(tidyverse)

x<-data.frame(id=c("a","a","b","b","b","c","c","c","c"),
              sub.id=c("1","2","1","2","3","1","2","3","4"))

y<-data.frame(id = as.character(rep(1:4,each=2)),
              sub.id = c("AA","CC","DD","AA","GG","OO","PP","OW"))

z<-data.frame(id = c("AA","CC","DD","GG","OO","OW","PP"),
              sub.id = as.character(1:7))

dfs<-list(x,y,z)

I've tried using reduce from the purrr package but this will join all dataframes in the list to the first dataframe. In this case the x dataframe.

dfs %>% 
  reduce(full_join,by = c("sub.id" = "id"))

Is there a way to perform a full_join to the dataframes found in a list such that the by follows the sequence that the dataframes appear on the list? In this example the sub.id of x would match with id of y and then the sub.id from y after joining would match the id of z for the final join.

EDIT: The expected result of this should be similar to the following:

    id sub.id.x sub.id.y sub.id.y.y
1   a        1       AA          1
2   a        1       CC          2
3   a        2       DD          3
4   a        2       AA          1
5   b        1       AA          1
6   b        1       CC          2
7   b        2       DD          3
8   b        2       AA          1
9   b        3       GG          4
10  b        3       OO          5
11  c        1       AA          1
12  c        1       CC          2
13  c        2       DD          3
14  c        2       AA          1
15  c        3       GG          4
16  c        3       OO          5
17  c        4       PP          7
18  c        4       OW          6

Joinded column name suffixes unchanged at this time.

Upvotes: 1

Views: 1681

Answers (2)

Onyambu
Onyambu

Reputation: 79258

If we can assume that the joining columns are always found on the end of the first dataframe and the first on the second dataframe, then you could do:

In Base R:

Reduce(function(x,y) merge(x,y,by.x = tail(names(x),1), by.y = names(y)[1], all = TRUE), dfs)

   sub.id1 sub.id0 id sub.id11
1       AA       1  a        1
2       AA       2  a        1
3       AA       1  c        1
4       AA       2  b        1
5       AA       1  b        1
6       AA       2  c        1
7       CC       1  a        2
8       CC       1  b        2
9       CC       1  c        2
10      DD       2  b        3
11      DD       2  a        3
12      DD       2  c        3
13      GG       3  b        4
14      GG       3  c        4
15      OO       3  b        5
16      OO       3  c        5
17      OW       4  c        6
18      PP       4  c        7

Upvotes: 2

akrun
akrun

Reputation: 887251

Perhaps, we need a for loop to change the column names after each join on the output generated

out <- dfs[[1]]
for(i in 2:length(dfs)) {
    out <- full_join(out, dfs[[i]], by = c('sub.id' = 'id'))
    names(out)[names(out) == 'sub.id'] <- paste0("sub.id", i)
    names(out)[names(out) == 'sub.id.y'] <- 'sub.id'
  }

-output

out
#   id sub.id2 sub.id3 sub.id
#1   a       1      AA      1
#2   a       1      CC      2
#3   a       2      DD      3
#4   a       2      AA      1
#5   b       1      AA      1
#6   b       1      CC      2
#7   b       2      DD      3
#8   b       2      AA      1
#9   b       3      GG      4
#10  b       3      OO      5
#11  c       1      AA      1
#12  c       1      CC      2
#13  c       2      DD      3
#14  c       2      AA      1
#15  c       3      GG      4
#16  c       3      OO      5
#17  c       4      PP      7
#18  c       4      OW      6

Upvotes: 3

Related Questions