Tshawytscha
Tshawytscha

Reputation: 21

How to merge csv files from nested folders in R

I have a large collection of csv files that are in different folders and in folders within folders that I need to merge into one file. It would be easy if they were all in one directory but I don't know of a simple way to pull them all out of the different folders. I could combine them one by one but there are A LOT of them.

ex:

+ working directory
|
+-- · data.csv
+-- · data2.csv
+-- + NewFolder
    |
    +-- · data3.csv
    +-- + NewFolder2
        |
        +-- · data4.csv

I want one file that combines all data csv files

Upvotes: 2

Views: 3104

Answers (4)

Charles Gallagher
Charles Gallagher

Reputation: 66

Base R solution.

files <- list.files(path = ".",pattern = "*.csv",recursive = TRUE)

write.csv(do.call(rbind,lapply(files, read.csv)), "combined_data.csv")

Upvotes: 0

Barranka
Barranka

Reputation: 21047

You can use dir() with recursive set to TRUE to list all files in the folder tree, and you can use pattern to define a regular expression to filter the .csv files. An example:

csv_files <- dir(pattern='.*[.]csv', recursive = T)

or even better and simpler (thanks to speendo for his comment):

csv_files <- dir(pattern='*.csv$', recursive = T)

The explanation.

  • pattern='*.csv$: The pattern argument must be a regular expression that filters the file names. This RegEx filters out the file names that end with .csv.

    If you want to filter that starts with data, you should try a pattern like this: pattern='^data.*.csv$'

  • recursive=T: Forces dir() to traverse recursively through all folders below the working directory.

After you get the file list, and assuming all of them have the same structure (i.e. all the files have the same columns), you can merge them with read.csv() and rbind():

for(i in 1:length(csv_files)) {
  if(i == 1)
    df <- read.csv(csv_files[i])
  else
    df <- rdbind(df, read.csv(csv_files[i]))
}

Ramnath suggests in his comment a faster way to merge the .csv files (again, assuming all of them have the same structure):

library(dplyr)
df <- rbind_all(lapply(csv_files, read_csv))

Upvotes: 8

Ramnath
Ramnath

Reputation: 55715

Here is a solution with dplyr

# get list of files ending in csv in directory root
dir(root, pattern='csv$', recursive = TRUE, full.names = TRUE) %>%
  # read files into data frames
  lapply(FUN = read.csv) %>%
  # bind all data frames into a single data frame
  rbind_all %>%
  # write into a single csv file
  write.csv("all.csv")

Upvotes: 2

speendo
speendo

Reputation: 13345

This solution has the assumption that all *.csv files have the same structure.

(Untested)

fileList <- list.files(
  pattern="*.csv$",
  recursive=TRUE,
  full.name=TRUE,
  )

completeCSV <- data.frame()


for(file in fileList) {
  print(file) # for debug: print current file
  if (nrow(completeCSV) == 0) {
    completeCSV <- read.csv(file)
  } else {
    curDF <- read.csv(file) # could also be read.csv2()
    rbind(completeCSV, curDF)
  }
}

write.csv(completeCSV) # could also be write.csv2()

Upvotes: 0

Related Questions