Haakonkas
Haakonkas

Reputation: 1041

Merging multiple data frames in a list to another data frame by various columns

In my code I have used multiple left_join to merge separate data frames to the one I am working on in a dplyr chain. I imported the data frames I wanted to merge with the other one into a list, then did some manipulations directly on that list with lapply to prepare them for merging.

As of now, I have used list2env(list, envir = .GlobalEnv) to create separate data frames from the list, then used left_join to merge each one separately, by unique columns for each data frame, like this:

Test data:

List:

structure(list(df2 = structure(list(x = structure(c(2L, 1L, 3L
), .Label = c("A", "B", "C"), class = "factor"), a = c(-0.331543943439452, 
0.0588350184156617, 1.03657229544754)), .Names = c("x", "a"), row.names = c(NA, 
-3L), class = "data.frame"), df3 = structure(list(z = structure(c(3L, 
2L, 1L), .Label = c("K", "L", "M"), class = "factor"), b = c(-0.897094152848114, 
0.97612075490695, 0.650264147064918)), .Names = c("z", "b"), row.names = c(NA, 
-3L), class = "data.frame")), .Names = c("df2", "df3"))

To create separate data frames:

list2env(testlist, envir = .GlobalEnv)

Data frame:

structure(list(x = structure(1:3, .Label = c("A", "B", "C"), class = "factor"), 
    y = 1:3, z = structure(1:3, .Label = c("K", "L", "M"), class = "factor")), .Names = c("x", 
"y", "z"), row.names = c(NA, -3L), class = "data.frame")

To join:

library(dplyr)

test_df %>%
    left_join(., df2, by = "x") %>%
    left_join(., df3, by = "z")

(Note that my list has around eight data frames with 2 - 3 columns each. I included only two data frames in this list for simplicity)

All the data frames have their own individual "by" column. What I want to know is if there is a simpler way to do this, f. ex by merging with the whole list directly, and automatically detecting which columns are similar and merging by them for each data frame instead of doing left_join eight separate times?

EDIT

I tried running the following code, as suggested by @akrun:

out <- test
for(i in seq_along(table_list)) {
  nm1 <- intersect(names(out), names(table_list[[i]]))
  out <- merge(out, table_list[[i]], by = nm1)
}
out

Where test is the data frame to merge to, and table_list is the list of data frames. This works for these small test data frames, but seem to introduce duplication of individual rows in the data frame, resulting in more rows.

More complex example data frame:

structure(list(x = structure(c(1L, 2L, 3L, 4L, 1L, 2L, 3L, 4L
), .Label = c("A", "B", "C", "D"), class = "factor"), y = c(1, 
2, 3, 4, 1, 2, 3, 4), z = structure(c(1L, 2L, 3L, 1L, 2L, 3L, 
1L, 2L), .Label = c("K", "L", "M"), class = "factor")), .Names = c("x", 
"y", "z"), row.names = c(NA, -8L), class = "data.frame")

Upvotes: 1

Views: 1076

Answers (2)

phiver
phiver

Reputation: 23608

Using the complicated test_df, why not use reduce from purrr together with left_join from dplyr? I have included the messages and warning message in the code below.

library(dplyr)
library(purrr)

all_dfs <- reduce(my_list, left_join, .init = test_df)

# (warning) messages from using left_join
# Joining, by = "x"
# Joining, by = "z"
# Warning message:
# Column `x` joining factors with different levels, coercing to character vector 

all_dfs

  x y z           a          b
1 A 1 K  0.05883502  0.6502641
2 B 2 L -0.33154394  0.9761208
3 C 3 M  1.03657230 -0.8970942
4 D 4 K          NA  0.6502641
5 A 1 L  0.05883502  0.9761208
6 B 2 M -0.33154394 -0.8970942
7 C 3 K  1.03657230  0.6502641
8 D 4 L          NA  0.9761208

Upvotes: 1

akrun
akrun

Reputation: 886938

Based on the description, it seems that we need to check for intersecting column names before each merge

out <- test_df
for(i in seq_along(testlist)) {
   nm1 <- intersect(names(test_df), names(testlist[[i]]))
   out <- merge(out, testlist[[i]], by = nm1, all.x = TRUE)
 }

out
#  z x y           a          b
#1 K A 1  0.05883502  0.6502641
#2 L B 2 -0.33154394  0.9761208
#3 M C 3  1.03657230 -0.8970942

Upvotes: 1

Related Questions