WonderSteve
WonderSteve

Reputation: 927

Faster way to read multiple csv to one data frame?

Is there anyway to speed up the following process in R?

theFiles <- list.files(path="./lca_rs75_summary_logs", full.names=TRUE, pattern="*.summarylog")

listOfDataFrames <- NULL
masterDataFrame <- NULL

for (i in 1:length(theFiles)) {
    tempDataFrame <- read.csv(theFiles[i], sep="\t", header=TRUE)
    #Dropping some unnecessary row
    toBeRemoved <- which(tempDataFrame$Name == "")
    tempDataFrame <- tempDataFrame[-toBeRemoved,]
    #Now stack the data frame on the master data frame
    masterDataFrame <- rbind(masterDataFrame, tempDataFrame)
}

Basically, I am reading multiple csv files in a directory. I want to combine all the csv files to one giant data frame by stacking the rows. The loop seems to longer to run as the masterDataFrame is growing in size. I am doing this on a linux cluster.

Upvotes: 11

Views: 2465

Answers (1)

Arun
Arun

Reputation: 118839

Updated answer with data.table::fread.

require(data.table)
out = rbindlist(lapply(theFiles, function(file) {
         dt = fread(file)
         # further processing/filtering
      }))

fread() automatically detects header, file separator, column classes, doesn't convert strings to factor by default.. handles embedded quotes, is quite fast etc.. See ?fread for more.


See history for old answers.

Upvotes: 13

Related Questions