Joe
Joe

Reputation: 1768

How to join two lists of data frames obtaining a third list with the joined data frames?

I have two lists each with 18 data frames that have two columns, one containing ids, the other data. My goal is to fully join the data frames of both lists (the first one of list1 with the first one of list2 and so on) by the id column such that the resulting 18 data frames (again stored in a list) each have 3 columns (one id column, two data columns).

Plesae note that the data frames in list1 do not necessarily have the same length as the ones in list2, and that the data column of the data frames in list1 is differently named than the one in list2.

Here is a downsized example:

list1 <- list(df1 = data.frame(id_col = c(1:3), data_1 = letters[1:3]),
              df2 = data.frame(id_col = c(1:4), data_1 = letters[1:4]))
list2 <- list(df1 = data.frame(id_col = c(1:4), data_2 = LETTERS[1:4]), 
              df2 = data.frame(id_col = c(1:7), data_2 = LETTERS[1:7]))

I guess the solution goes something like this:

mapply(function(x, y) {
  # some function with e. g. dplyr::full_join 
  }, x = list1, y = list2)

Upvotes: 1

Views: 959

Answers (3)

Mike Stanley
Mike Stanley

Reputation: 1480

The way I would do this is with purrr to loop through the lists and dplyr to do the joining:

list1 <- list(df1 = data.frame(id_col = c(1:3), data_1 = letters[1:3]),
              df2 = data.frame(id_col = c(1:4), data_1 = letters[1:4]))
list2 <- list(df1 = data.frame(id_col = c(1:4), data_2 = LETTERS[1:4]), 
              df2 = data.frame(id_col = c(1:7), data_2 = LETTERS[1:7]))

purrr::map2(list1, list2, dplyr::full_join, by = "id_col")
#> $df1
#>   id_col data_1 data_2
#> 1      1      a      A
#> 2      2      b      B
#> 3      3      c      C
#> 4      4   <NA>      D
#> 
#> $df2
#>   id_col data_1 data_2
#> 1      1      a      A
#> 2      2      b      B
#> 3      3      c      C
#> 4      4      d      D
#> 5      5   <NA>      E
#> 6      6   <NA>      F
#> 7      7   <NA>      G

map2 iterates through two lists of equal length, passing the nth element of those lists to the function in the third argument. The fourth argument is also passed to the function.

Upvotes: 2

moodymudskipper
moodymudskipper

Reputation: 47320

It's a job for purrr::map2 :) :

map2(list1,list2,~merge(.x,.y))

# $df1
#   id_col data_1 data_2
# 1      1      a      A
# 2      2      b      B
# 3      3      c      C
# 
# $df2
#   id_col data_1 data_2
# 1      1      a      A
# 2      2      b      B
# 3      3      c      C
# 4      4      d      D

(adjust the parameters of merge to get the kind of join you desire)

Upvotes: 2

Eric Watt
Eric Watt

Reputation: 3240

list3 <- lapply(1:length(list1), 
                function(x) merge(list1[[x]], 
                                  list2[[x]], 
                                  by = "id_col"))
list3
# [[1]]
#   id_col data_1 data_2
# 1      1      a      A
# 2      2      b      B
# 3      3      c      C
# 
# [[2]]
#   id_col data_1 data_2
# 1      1      a      A
# 2      2      b      B
# 3      3      c      C
# 4      4      d      D

If you want a full join, specify all = TRUE

list3 <- lapply(1:length(list1), 
                function(x) merge(list1[[x]], 
                                  list2[[x]], 
                                  by = "id_col",
                                  all = TRUE))
list3
# [[1]]
#   id_col data_1 data_2
# 1      1      a      A
# 2      2      b      B
# 3      3      c      C
# 4      4   <NA>      D
# 
# [[2]]
#   id_col data_1 data_2
# 1      1      a      A
# 2      2      b      B
# 3      3      c      C
# 4      4      d      D
# 5      5   <NA>      E
# 6      6   <NA>      F
# 7      7   <NA>      G

Upvotes: 2

Related Questions