user3476463
user3476463

Reputation: 4575

Error importing and merging text files with read.table() or read.csv()

I'm trying to import a large number of text files and merge them into a single datatable using the script below, so I can parse the text . The files were originally eml files so the formatting is a mess. I'm not interested in separating the text into fields, it would be perfectly fine if the datatable only had one field with all the text from the files in it. When I run the script below I keep getting the following error.

Error in rbind(deparse.level, ...) : 
  numbers of columns of arguments do not match 

I've tried setting sep= various things or running it without it, but it still gives the same error. I've also tried running the same code except replacing read.table with read.csv, but again I get the same error. Any tips would be greatly appreciated.

setwd("~/stuff/folder")
file_list <- list.files()
for (file in file_list){
  # if the merged dataset doesn't exist, create it
  if (!exists("dataset")){
    dataset <- read.table(file, header=FALSE,fill=TRUE,comment.char="",strip.white = TRUE)
  }
  # if the merged dataset does exist, append to it
  if (exists("dataset")){
    temp_dataset <-read.table(file, header=FALSE,fill=TRUE,comment.char="",strip.white = TRUE)
    dataset<-rbind(dataset, temp_dataset)
    rm(temp_dataset)
  }
}

Upvotes: 0

Views: 160

Answers (1)

Julian Wittische
Julian Wittische

Reputation: 1237

I think something lighter could work for you and may avoid this specific error:

them.files <- lapply(1:number.of.files,function(x) 
read.table(paste(paste("lolz",x,sep=""),'txt',sep='.')),header=FALSE,fill=TRUE,comment.char="",strip.white = TRUE)

Adapt the function to whatever your files names are.

Edit: Actually maybe something like this could be better:

 them.files <- lapply(1:length(file_list),function(x) 
 read.table(file_list[x],header=FALSE,fill=TRUE,comment.char="",strip.white = TRUE)

Merging step:

everyday.Im.merging <- do.call(rbind,them.files)

I am sure there are beautiful ways to do it with dplyr or data.table but I am a caveman.

If I may add something, I would also fancy a checking step prior the previous line of code:

sapply(them.files,str)

Upvotes: 1

Related Questions