isDotR
isDotR

Reputation: 1063

More problems with "incomplete final line"

This problem is similar to that seen here.

I have a large number of large CSVs which I am loading and parsing serially through a function. Many of these CSVs present no problem, but there are several which are causing problems when I try to load them with read.csv().

I have uploaded one of these files to a public Dropbox folder here (note that the file is around 10.4MB).

When I try to read.csv() that file, I get the warning warning message:

In read.table(file = file, header = header, sep = sep, quote = quote,  :
  incomplete final line found by readTableHeader on ...

And I cannot isolate the problem, despite scouring StackOverflow and Rhelp for solutions. Maddeningly, when I run

Import <- read.csv("http://dl.dropbox.com/u/83576/Candidate%20Mentions.csv")

using the Dropbox URL instead of my local path, it loads, but when I then save that very data frame and try to reload it thus:

write.csv(Import, "Test_File.csv", row.names = F)
TestImport <- read.csv("Test_File.csv")

I get the "incomplete final line" warning again.

So, I am wondering why the Dropbox-loaded version works, while the local version does not, and how I can make my local versions work -- since I have somewhere around 400 of these files (and more every day), I can't use a solution that can't be automated in some way.

In a related problem, perhaps deserving of its own question, it appears that some "special characters" break the read.csv() process, and prevent the loading of the entire file. For example, one CSV which has 14,760 rows only loads 3,264 rows. The 3,264th row includes this eloquent Tweet:

"RT @akiron3: ácÎå23BkªÐÞ'q(@BarackObama )nĤÿükTPP ÍþnĤüÈ’áY‹ªÐÞĤÿüŽ \&’ŸõWˆFSnĤ©’FhÎåšBkêÕ„kĤüÈLáUŒ~YÒhttp://t.co/ABNnWfTN “jg)(WˆF"

Again, given the serialized loading of several hundred files, how can I (a) identify what is causing this break in the read.csv() process, and (b) fix the problem with code, rather than by hand?

Thanks so much for your help.

Upvotes: 3

Views: 2420

Answers (1)

IRTFM
IRTFM

Reputation: 263362

1)

 suppressWarnings(TestImport <- read.csv("Test_File.csv") )

2) Unmatched quotes are the most common cause of apparent premature closure. You could try adding all of these:

 quote="", na,strings="", comment.char=""

Upvotes: 3

Related Questions