chipsin
chipsin

Reputation: 675

Combining multiple data frames in a folder with varying column names

I have number of large data frames, with each row having a unique ID and each data frame could have different sets of column names (some are the same and some are not). I would like to have all of the data frames collated into a single data frame. I am currently using the smartbind function in gtools to bind together small data frames such as those in the example in below (and the output is what I require).

df1 <- data.frame("ID" = c(1,2,3))
df1$A <- c(1, 2, 3)
df1$B <- c("a", "d", "g")
df1


df2 <- data.frame("ID" = c(7,9,10))
df2$A <- c(5, 6, 7)
df2$C <- c("f", "bv", "gn")
df2


df3 <- data.frame("ID" = c(11,12,13))
df3$A <- c("g", "h", 7)
df3$B <- c("f", "bv", "gn")
df3$D <- c(1, 5, 7)
df3

gtools::smartbind(df1,df2,df3)

    ID A    B    C  D
1:1  1 1    a <NA> NA
1:2  2 2    d <NA> NA
1:3  3 3    g <NA> NA
2:1  7 5 <NA>    f NA
2:2  9 6 <NA>   bv NA
2:3 10 7 <NA>   gn NA
3:1 11 g    f <NA>  1
3:2 12 h   bv <NA>  5
3:3 13 7   gn <NA>  7

Although this is a good solution, it is fairly manual process (and I have read that smartbind is not ideal for large data frames?). I would like to ideally have code which would allow me to bind multiple .csv files in a folder. I currently have code which joins multiple .csv files in a folder (below), but this is a solution for joining multliple data frames and not a solution for binding multiple large data frames which have different sets of column names.

Below is the code I am using to merge multiple files in a folder:

data_all <- list.files(path = "C:/Users/Documents/path/",     
                       pattern = "*.csv", full.names = TRUE) %>% 
            lapply(readr::read_csv) 

merged_files <- data_all %>% purrr::reduce(full_join, by = "ID") 

Upvotes: 0

Views: 237

Answers (1)

Ronak Shah
Ronak Shah

Reputation: 389135

You can use do.call to combine list of dataframes using smartbind.

do.call(gtools::smartbind, data_all)

smartbind doesn't require classes of column of the same type and it changes the class implicitly.


If you convert the classes explicitly, you can also do this using tidyverse functions.

library(dplyr)
library(purrr)

map_df(data_all, ~.x %>% mutate(across(-1, as.character)))

#  ID A    B    C    D
#1  1 1    a <NA> <NA>
#2  2 2    d <NA> <NA>
#3  3 3    g <NA> <NA>
#4  7 5 <NA>    f <NA>
#5  9 6 <NA>   bv <NA>
#6 10 7 <NA>   gn <NA>
#7 11 g    f <NA>    1
#8 12 h   bv <NA>    5
#9 13 7   gn <NA>    7

Upvotes: 1

Related Questions