Reputation: 53
I have a large number of .txt-files that I try to import in R. Some of the .txt-files are empty, but they I still want to include in my data.frame which .txt-files specifically are empty. With the following function I can import all necessary .txt-files to a list:
file_list <- list.files()
myList <- lapply(file_list, function(x) {tryCatch(read.table(x, header = F, sep = '|'), error=function(e) NULL)})
However, when I change this list into a data.frame with the following code:
myDataframe <- rbind.fill(lapply(myList, function(f) {as.data.frame(Filter(Negate(is.null), f))}))
I lose the information about which .txt-files were empty.
Ultimately, what I would like is that on each row a column is added with the name of the .txt-files, for example via list.files(). In that way, I could see which rows are empty.
Upvotes: 0
Views: 439
Reputation: 834
This a solution to your problem, but not specifically an answer to your question.
Depending on the size of your text files. You should consider switching to data.table
. A nifty system for handling large files rapidly and in a memory efficient manner.
install.packages("data.table")
library(data.table)
file_list <- list.files()
Results <- NULL
for (i in file_list){
# data.table command to read txt files
i.file <- tryCatch(fread(i,colClasses="character"), error=function(e) e)
if(!class(i.file)[1]=="data.table"){
# This condition checks that no errors occured
i.file <- data.table(cbind(txt.file=i,is.empty="YES",
message=i.file$message))
} else if(nrow(i.file)==0){
# Just in case files are still empty even though no error
i.file <- data.table(cbind(txt.file=i,is.empty="YES",
message=NA))
} else {
i.file[,txt.file:=i]
i.file[,is.empty:="No"]
}
Results <- rbind(Results,i.file,fill=TRUE)
rm(i.file);gc()
}
# to find which files are empty
Results[is.empty=="YES"][,txt.file]
# double check error types
Results[is.na(message)][,message]
# expect all to be something like 'file is empty'
# if you insist on using data.frames
Results <- data.frame(Results)
This should work for you. The script may be converted to a function that works with lapply, but i wanted it to be easy to understand and generalize.
Also I'm a huge fan of data.table
and transitioning to it was really helpful to me. For more on the package check out this cheatsheet.
EDIT: Script modifed so it can accommodate empty files with whitespaces
Upvotes: 1