How to continue reading with read.table ignoring errors?

Question

While reading a squid log in zipped format using read.table(), I get the following error:

Error in scan(file, what, nmax, sep, dec, quote, skip, nlines, na.strings,  : line 134147 did not have 10 elements

Unzipping the file, I see that line 134147 is corrupted. However, there are other lines also similarly corrupted, so it is not possible for me to manually run read.tables, see the offending line number, delete it, and repeat the whole process again.

Is there a way by which I could tell R to ignore such lines and continue reading the rest of the table? I tried with try() but without any success.

I have read some related posts on ignoring read.table() errors, but all of them talk of correcting the offending line(s) which is not an option for me because the files are zipped, and I would have to manually unzip them; and also there may be several such corrupted lines.

My code for reading (with the try block):

 try({dfApr4gw1 <- read.table(
     "log1.gz", header=FALSE, 
      col.names = c("time", "duration", "local ip", "squid result code", "bytes", "request method", "url", "user", "squid hierarchy code", "type"),
      na.strings="-",
      colClasses = c("numeric", "integer", "factor", "factor", "integer", "factor", "character", "character", "character", "factor")
  )})

How to continue reading with read.table ignoring errors?

Answers (1)

Related Questions