David Maij
David Maij

Reputation: 53

import large number of .txt files to data.frame, include empty .txt files by giving them a row data.frame

I have a large number of .txt-files that I try to import in R. Some of the .txt-files are empty, but they I still want to include in my data.frame which .txt-files specifically are empty. With the following function I can import all necessary .txt-files to a list:

file_list <- list.files()
myList <- lapply(file_list, function(x) {tryCatch(read.table(x, header = F, sep = '|'), error=function(e) NULL)})

However, when I change this list into a data.frame with the following code:

myDataframe <- rbind.fill(lapply(myList, function(f) {as.data.frame(Filter(Negate(is.null), f))}))

I lose the information about which .txt-files were empty.

Ultimately, what I would like is that on each row a column is added with the name of the .txt-files, for example via list.files(). In that way, I could see which rows are empty.

Upvotes: 0

Views: 439

Answers (1)

JustGettinStarted
JustGettinStarted

Reputation: 834

This a solution to your problem, but not specifically an answer to your question.

Depending on the size of your text files. You should consider switching to data.table. A nifty system for handling large files rapidly and in a memory efficient manner.

install.packages("data.table")
library(data.table)
file_list <- list.files()

Results <-  NULL
for (i in file_list){
  # data.table command to read txt files
  i.file <- tryCatch(fread(i,colClasses="character"), error=function(e) e)

  if(!class(i.file)[1]=="data.table"){
    # This condition checks that no errors occured
    i.file <- data.table(cbind(txt.file=i,is.empty="YES",
                         message=i.file$message))
  } else if(nrow(i.file)==0){
    # Just in case files are still empty even though no error
    i.file <- data.table(cbind(txt.file=i,is.empty="YES",
                         message=NA))
  } else {
    i.file[,txt.file:=i]
    i.file[,is.empty:="No"]
  }
  Results <- rbind(Results,i.file,fill=TRUE)
  rm(i.file);gc()
}

# to find which files are empty
Results[is.empty=="YES"][,txt.file]
# double check error types
Results[is.na(message)][,message] 
# expect all to be something like 'file is empty'

# if you insist on using data.frames
Results <- data.frame(Results)

This should work for you. The script may be converted to a function that works with lapply, but i wanted it to be easy to understand and generalize.

Also I'm a huge fan of data.table and transitioning to it was really helpful to me. For more on the package check out this cheatsheet.

EDIT: Script modifed so it can accommodate empty files with whitespaces

Upvotes: 1

Related Questions