Reputation: 1041
In my code I have used multiple left_join
to merge separate data frames to the one I am working on in a dplyr chain. I imported the data frames I wanted to merge with the other one into a list, then did some manipulations directly on that list with lapply to prepare them for merging.
As of now, I have used list2env(list, envir = .GlobalEnv)
to create separate data frames from the list, then used left_join
to merge each one separately, by unique columns for each data frame, like this:
Test data:
List:
structure(list(df2 = structure(list(x = structure(c(2L, 1L, 3L
), .Label = c("A", "B", "C"), class = "factor"), a = c(-0.331543943439452,
0.0588350184156617, 1.03657229544754)), .Names = c("x", "a"), row.names = c(NA,
-3L), class = "data.frame"), df3 = structure(list(z = structure(c(3L,
2L, 1L), .Label = c("K", "L", "M"), class = "factor"), b = c(-0.897094152848114,
0.97612075490695, 0.650264147064918)), .Names = c("z", "b"), row.names = c(NA,
-3L), class = "data.frame")), .Names = c("df2", "df3"))
To create separate data frames:
list2env(testlist, envir = .GlobalEnv)
Data frame:
structure(list(x = structure(1:3, .Label = c("A", "B", "C"), class = "factor"),
y = 1:3, z = structure(1:3, .Label = c("K", "L", "M"), class = "factor")), .Names = c("x",
"y", "z"), row.names = c(NA, -3L), class = "data.frame")
To join:
library(dplyr)
test_df %>%
left_join(., df2, by = "x") %>%
left_join(., df3, by = "z")
(Note that my list has around eight data frames with 2 - 3 columns each. I included only two data frames in this list for simplicity)
All the data frames have their own individual "by" column. What I want to know is if there is a simpler way to do this, f. ex by merging with the whole list directly, and automatically detecting which columns are similar and merging by them for each data frame instead of doing left_join eight separate times?
EDIT
I tried running the following code, as suggested by @akrun:
out <- test
for(i in seq_along(table_list)) {
nm1 <- intersect(names(out), names(table_list[[i]]))
out <- merge(out, table_list[[i]], by = nm1)
}
out
Where test
is the data frame to merge to, and table_list
is the list of data frames.
This works for these small test data frames, but seem to introduce duplication of individual rows in the data frame, resulting in more rows.
More complex example data frame:
structure(list(x = structure(c(1L, 2L, 3L, 4L, 1L, 2L, 3L, 4L
), .Label = c("A", "B", "C", "D"), class = "factor"), y = c(1,
2, 3, 4, 1, 2, 3, 4), z = structure(c(1L, 2L, 3L, 1L, 2L, 3L,
1L, 2L), .Label = c("K", "L", "M"), class = "factor")), .Names = c("x",
"y", "z"), row.names = c(NA, -8L), class = "data.frame")
Upvotes: 1
Views: 1076
Reputation: 23608
Using the complicated test_df, why not use reduce
from purrr
together with left_join from dplyr? I have included the messages and warning message in the code below.
library(dplyr)
library(purrr)
all_dfs <- reduce(my_list, left_join, .init = test_df)
# (warning) messages from using left_join
# Joining, by = "x"
# Joining, by = "z"
# Warning message:
# Column `x` joining factors with different levels, coercing to character vector
all_dfs
x y z a b
1 A 1 K 0.05883502 0.6502641
2 B 2 L -0.33154394 0.9761208
3 C 3 M 1.03657230 -0.8970942
4 D 4 K NA 0.6502641
5 A 1 L 0.05883502 0.9761208
6 B 2 M -0.33154394 -0.8970942
7 C 3 K 1.03657230 0.6502641
8 D 4 L NA 0.9761208
Upvotes: 1
Reputation: 886938
Based on the description, it seems that we need to check for intersecting column names before each merge
out <- test_df
for(i in seq_along(testlist)) {
nm1 <- intersect(names(test_df), names(testlist[[i]]))
out <- merge(out, testlist[[i]], by = nm1, all.x = TRUE)
}
out
# z x y a b
#1 K A 1 0.05883502 0.6502641
#2 L B 2 -0.33154394 0.9761208
#3 M C 3 1.03657230 -0.8970942
Upvotes: 1