Merging multiple data frames in a list to another data frame by various columns

Question

In my code I have used multiple left_join to merge separate data frames to the one I am working on in a dplyr chain. I imported the data frames I wanted to merge with the other one into a list, then did some manipulations directly on that list with lapply to prepare them for merging.

As of now, I have used list2env(list, envir = .GlobalEnv) to create separate data frames from the list, then used left_join to merge each one separately, by unique columns for each data frame, like this:

Test data:

List:

structure(list(df2 = structure(list(x = structure(c(2L, 1L, 3L
), .Label = c("A", "B", "C"), class = "factor"), a = c(-0.331543943439452, 
0.0588350184156617, 1.03657229544754)), .Names = c("x", "a"), row.names = c(NA, 
-3L), class = "data.frame"), df3 = structure(list(z = structure(c(3L, 
2L, 1L), .Label = c("K", "L", "M"), class = "factor"), b = c(-0.897094152848114, 
0.97612075490695, 0.650264147064918)), .Names = c("z", "b"), row.names = c(NA, 
-3L), class = "data.frame")), .Names = c("df2", "df3"))

To create separate data frames:

list2env(testlist, envir = .GlobalEnv)

Data frame:

structure(list(x = structure(1:3, .Label = c("A", "B", "C"), class = "factor"), 
    y = 1:3, z = structure(1:3, .Label = c("K", "L", "M"), class = "factor")), .Names = c("x", 
"y", "z"), row.names = c(NA, -3L), class = "data.frame")

To join:

library(dplyr)

test_df %>%
    left_join(., df2, by = "x") %>%
    left_join(., df3, by = "z")

(Note that my list has around eight data frames with 2 - 3 columns each. I included only two data frames in this list for simplicity)

All the data frames have their own individual "by" column. What I want to know is if there is a simpler way to do this, f. ex by merging with the whole list directly, and automatically detecting which columns are similar and merging by them for each data frame instead of doing left_join eight separate times?

EDIT

I tried running the following code, as suggested by @akrun:

out <- test
for(i in seq_along(table_list)) {
  nm1 <- intersect(names(out), names(table_list[[i]]))
  out <- merge(out, table_list[[i]], by = nm1)
}
out

Where test is the data frame to merge to, and table_list is the list of data frames. This works for these small test data frames, but seem to introduce duplication of individual rows in the data frame, resulting in more rows.

More complex example data frame:

structure(list(x = structure(c(1L, 2L, 3L, 4L, 1L, 2L, 3L, 4L
), .Label = c("A", "B", "C", "D"), class = "factor"), y = c(1, 
2, 3, 4, 1, 2, 3, 4), z = structure(c(1L, 2L, 3L, 1L, 2L, 3L, 
1L, 2L), .Label = c("K", "L", "M"), class = "factor")), .Names = c("x", 
"y", "z"), row.names = c(NA, -8L), class = "data.frame")

phiver · Accepted Answer

Using the complicated test_df, why not use reduce from purrr together with left_join from dplyr? I have included the messages and warning message in the code below.

library(dplyr)
library(purrr)

all_dfs <- reduce(my_list, left_join, .init = test_df)

# (warning) messages from using left_join
# Joining, by = "x"
# Joining, by = "z"
# Warning message:
# Column `x` joining factors with different levels, coercing to character vector 

all_dfs

  x y z           a          b
1 A 1 K  0.05883502  0.6502641
2 B 2 L -0.33154394  0.9761208
3 C 3 M  1.03657230 -0.8970942
4 D 4 K          NA  0.6502641
5 A 1 L  0.05883502  0.9761208
6 B 2 M -0.33154394 -0.8970942
7 C 3 K  1.03657230  0.6502641
8 D 4 L          NA  0.9761208

Merging multiple data frames in a list to another data frame by various columns

Answers (2)

Related Questions