Faster way to read multiple csv to one data frame?

Question

Is there anyway to speed up the following process in R?

theFiles <- list.files(path="./lca_rs75_summary_logs", full.names=TRUE, pattern="*.summarylog")

listOfDataFrames <- NULL
masterDataFrame <- NULL

for (i in 1:length(theFiles)) {
    tempDataFrame <- read.csv(theFiles[i], sep="	", header=TRUE)
    #Dropping some unnecessary row
    toBeRemoved <- which(tempDataFrame$Name == "")
    tempDataFrame <- tempDataFrame[-toBeRemoved,]
    #Now stack the data frame on the master data frame
    masterDataFrame <- rbind(masterDataFrame, tempDataFrame)
}

Basically, I am reading multiple csv files in a directory. I want to combine all the csv files to one giant data frame by stacking the rows. The loop seems to longer to run as the masterDataFrame is growing in size. I am doing this on a linux cluster.

Arun · Accepted Answer

Updated answer with data.table::fread.

require(data.table)
out = rbindlist(lapply(theFiles, function(file) {
         dt = fread(file)
         # further processing/filtering
      }))

fread() automatically detects header, file separator, column classes, doesn't convert strings to factor by default.. handles embedded quotes, is quite fast etc.. See ?fread for more.

See history for old answers.

Faster way to read multiple csv to one data frame?

Answers (1)

Related Questions