Reputation: 675
I have number of large data frames, with each row having a unique ID and each data frame could have different sets of column names (some are the same and some are not). I would like to have all of the data frames collated into a single data frame. I am currently using the smartbind
function in gtools
to bind together small data frames such as those in the example in below (and the output is what I require).
df1 <- data.frame("ID" = c(1,2,3))
df1$A <- c(1, 2, 3)
df1$B <- c("a", "d", "g")
df1
df2 <- data.frame("ID" = c(7,9,10))
df2$A <- c(5, 6, 7)
df2$C <- c("f", "bv", "gn")
df2
df3 <- data.frame("ID" = c(11,12,13))
df3$A <- c("g", "h", 7)
df3$B <- c("f", "bv", "gn")
df3$D <- c(1, 5, 7)
df3
gtools::smartbind(df1,df2,df3)
ID A B C D
1:1 1 1 a <NA> NA
1:2 2 2 d <NA> NA
1:3 3 3 g <NA> NA
2:1 7 5 <NA> f NA
2:2 9 6 <NA> bv NA
2:3 10 7 <NA> gn NA
3:1 11 g f <NA> 1
3:2 12 h bv <NA> 5
3:3 13 7 gn <NA> 7
Although this is a good solution, it is fairly manual process (and I have read that smartbind is not ideal for large data frames?). I would like to ideally have code which would allow me to bind multiple .csv files in a folder. I currently have code which joins multiple .csv files in a folder (below), but this is a solution for joining multliple data frames and not a solution for binding multiple large data frames which have different sets of column names.
Below is the code I am using to merge multiple files in a folder:
data_all <- list.files(path = "C:/Users/Documents/path/",
pattern = "*.csv", full.names = TRUE) %>%
lapply(readr::read_csv)
merged_files <- data_all %>% purrr::reduce(full_join, by = "ID")
Upvotes: 0
Views: 237
Reputation: 389135
You can use do.call
to combine list of dataframes using smartbind
.
do.call(gtools::smartbind, data_all)
smartbind
doesn't require classes of column of the same type and it changes the class implicitly.
If you convert the classes explicitly, you can also do this using tidyverse
functions.
library(dplyr)
library(purrr)
map_df(data_all, ~.x %>% mutate(across(-1, as.character)))
# ID A B C D
#1 1 1 a <NA> <NA>
#2 2 2 d <NA> <NA>
#3 3 3 g <NA> <NA>
#4 7 5 <NA> f <NA>
#5 9 6 <NA> bv <NA>
#6 10 7 <NA> gn <NA>
#7 11 g f <NA> 1
#8 12 h bv <NA> 5
#9 13 7 gn <NA> 7
Upvotes: 1