Reputation: 21
I have a large collection of csv files that are in different folders and in folders within folders that I need to merge into one file. It would be easy if they were all in one directory but I don't know of a simple way to pull them all out of the different folders. I could combine them one by one but there are A LOT of them.
ex:
+ working directory
|
+-- · data.csv
+-- · data2.csv
+-- + NewFolder
|
+-- · data3.csv
+-- + NewFolder2
|
+-- · data4.csv
I want one file that combines all data csv files
Upvotes: 2
Views: 3104
Reputation: 66
files <- list.files(path = ".",pattern = "*.csv",recursive = TRUE)
write.csv(do.call(rbind,lapply(files, read.csv)), "combined_data.csv")
Upvotes: 0
Reputation: 21047
You can use dir()
with recursive
set to TRUE
to list all files in the folder tree, and you can use pattern
to define a regular expression to filter the .csv
files. An example:
csv_files <- dir(pattern='.*[.]csv', recursive = T)
or even better and simpler (thanks to speendo for his comment):
csv_files <- dir(pattern='*.csv$', recursive = T)
The explanation.
pattern='*.csv$
: The pattern
argument must be a regular expression that filters the file names. This RegEx filters out the file names that end with .csv
.
If you want to filter that starts with data
, you should try a pattern like this: pattern='^data.*.csv$'
recursive=T
: Forces dir()
to traverse recursively through all folders below the working directory.After you get the file list, and assuming all of them have the same structure (i.e. all the files have the same columns), you can merge them with read.csv()
and rbind()
:
for(i in 1:length(csv_files)) {
if(i == 1)
df <- read.csv(csv_files[i])
else
df <- rdbind(df, read.csv(csv_files[i]))
}
Ramnath suggests in his comment a faster way to merge the .csv
files (again, assuming all of them have the same structure):
library(dplyr)
df <- rbind_all(lapply(csv_files, read_csv))
Upvotes: 8
Reputation: 55715
Here is a solution with dplyr
# get list of files ending in csv in directory root
dir(root, pattern='csv$', recursive = TRUE, full.names = TRUE) %>%
# read files into data frames
lapply(FUN = read.csv) %>%
# bind all data frames into a single data frame
rbind_all %>%
# write into a single csv file
write.csv("all.csv")
Upvotes: 2
Reputation: 13345
This solution has the assumption that all *.csv
files have the same structure.
(Untested)
fileList <- list.files(
pattern="*.csv$",
recursive=TRUE,
full.name=TRUE,
)
completeCSV <- data.frame()
for(file in fileList) {
print(file) # for debug: print current file
if (nrow(completeCSV) == 0) {
completeCSV <- read.csv(file)
} else {
curDF <- read.csv(file) # could also be read.csv2()
rbind(completeCSV, curDF)
}
}
write.csv(completeCSV) # could also be write.csv2()
Upvotes: 0