Reputation: 397
I try to read a big file into r. While trying to read it, this error occurs. Even when i skip the first 800607 lines it doesn't disappear. I also tried to delete the line in the terminal with the command.
sed '800608d' filename.csv
It doesn't solved my problem. I would really appreciate if you could help me.
The original error i got from R is:
> data<-fread("filename.csv")
Read 2.0% of 34143409 rows
Error in fread("filename.csv") :
Field 16 on line 800607 starts with quote (") but then has a problem. It can contain balanced unescaped quoted subregions but if it does it can't contain embedded \n as well. Check for unbalanced unescaped quotes: """The attorney for Martin's family, Benjamin Crump, says the evidence is ""irrelevant\"""" """".","NULL","NULL","NULL","NULL","NULL","NULL","NULL","Negative"
In addition: Warning message:
In fread("filename.csv") :
Starting data input on line 8 and discarded previous non-empty line: done
Upvotes: 3
Views: 2345
Reputation: 54
I'm currently in the middle of resolving this kind of issue myself. I'm not sure if this will work for all cases--let alone all of the files I'm working with myself. But for now I seem to be getting some successes with:
skip.list <- c()
for (i in 1:length(dir(input.dir))){ # i=3
file <- dir(input.dir)[i]
ingested.file <- NULL
ingested.file <- try(fread(paste0(input.dir,file), header=T, stringsAsFactors=F))
if (class(ingested.file)=="try-error") {
error.line <-as.integer(sub(" .*","",sub(".*but line ","",as.character(ingested.file))))
app.reviews.input <- try(fread(paste0(input.dir,file), header=T, stringsAsFactors=F,skip=error.line))
if (class(ingested.file)=="try-error") {
skip.list_by.downloads <- c(skip.list_by.downloads, file)
next
}
}
}
I'm currently working with about 750 files of 1000 rows each--about 50 of which have the same issue. With this method however, I am able to read in 30 of those 50; the remaining 20 seem to have errors in multiple rows, but I am unable to specify multiple skip values.
If it were possible to specify more skips, then you could try a while-statement. i.e.
while (class(ingested.file)=="try-error") ...
and then update the error.list as many times as is necessary automatically.
I hope this helps!
Upvotes: 2