Reputation: 31
I have a large number of data files (>1000) in a single directory. I would like to merge them all in a single data frame in R. They all have the same number and types of columns. So far what I have is:
setwd("directory")
files <- list.files()
for (i in 1:length(files)) assign(files[i], read.csv(files[i]))
This creates data frames for each of the 1000+ files. Is there any way to merge them, without having to type out a list of all 1000+ file names?
Any help would be appreciated!
Upvotes: 2
Views: 2963
Reputation: 34703
The standard way to do this with data.table
(recommended because of its speed) is:
library(data.table)
data <- rbindlist(lapply(list.files(), fread))
There are also additional functionalities, e.g.
rbindlist(lapply(list.files(), fread), fill = TRUE)
Will take care of the possibility that some or many of your files have different column names--any non-overlap will be filled with NA
in those files lacking that column.
EDIT: as @nicola mentioned, using assign
is to be avoided in general unless you really know what you're doing.
See this post for further reference to that end.
Upvotes: 12
Reputation: 3297
One good way to do that is to utilize data.table
. This library has two benefits that will work in your case: a) it has a fast way of reading .csv files, and b) a fast way of combining data.tables
(which are an extension of data.frame
) into one. So in this spirit, let me propose the following alternative:
# if you don't have data.table installed, run install.packages('data.table') first
library(data.table)
files <- list.files('directory', full.names = TRUE)
#create a list to manage the individual files, only used to merge them in the end
FILES_LIST=vector("list",length(files))
for (i in 1:length(files)) {
FILES_LIST[[i]]<-fread(files[i]) #this reads your .csv file
}
FILES_LIST = rbindlist(FILES_LIST) #this merges all of your files in a big data.table
The variable you are interested in, in the end is FILES_LIST
.
I hope this helps.
Upvotes: 3