Oksanna88
Oksanna88

Reputation: 31

Merging a large number of files from one directory into a data frame in R

I have a large number of data files (>1000) in a single directory. I would like to merge them all in a single data frame in R. They all have the same number and types of columns. So far what I have is:

setwd("directory")
files <- list.files()
for (i in 1:length(files)) assign(files[i], read.csv(files[i]))

This creates data frames for each of the 1000+ files. Is there any way to merge them, without having to type out a list of all 1000+ file names?

Any help would be appreciated!

Upvotes: 2

Views: 2963

Answers (2)

MichaelChirico
MichaelChirico

Reputation: 34703

The standard way to do this with data.table (recommended because of its speed) is:

library(data.table)
data <- rbindlist(lapply(list.files(), fread))

There are also additional functionalities, e.g.

rbindlist(lapply(list.files(), fread), fill = TRUE)

Will take care of the possibility that some or many of your files have different column names--any non-overlap will be filled with NA in those files lacking that column.


EDIT: as @nicola mentioned, using assign is to be avoided in general unless you really know what you're doing.

See this post for further reference to that end.

Upvotes: 12

Nikos
Nikos

Reputation: 3297

One good way to do that is to utilize data.table. This library has two benefits that will work in your case: a) it has a fast way of reading .csv files, and b) a fast way of combining data.tables (which are an extension of data.frame) into one. So in this spirit, let me propose the following alternative:

# if you don't have data.table installed, run install.packages('data.table') first
library(data.table)
files <- list.files('directory', full.names = TRUE)
#create a list to manage the individual files, only used to merge them in the end
FILES_LIST=vector("list",length(files)) 
for (i in 1:length(files)) {
    FILES_LIST[[i]]<-fread(files[i]) #this reads your .csv file
}
FILES_LIST = rbindlist(FILES_LIST) #this merges all of your files in a big data.table

The variable you are interested in, in the end is FILES_LIST.

I hope this helps.

Upvotes: 3

Related Questions