Reputation: 335
I have a number of csv-files of different size, but all somewhat big. Using read.csv
to read them into R takes longer than I've been patient to wait so far (several hours). I managed to read the biggest file (2.6 gb) very fast (less than a minute) with data.table
's fread
.
My problem occurs when I try to read a file of half the size. I get the following error message:
Error in
fread("C:/Users/Jesper/OneDrive/UdbudsVagten/BBR/CO11700T.csv"
,:Expecting 21 cols, but line 2557 contains text after processing all cols. It is very likely that this is due to one or more fields having embedded
sep=';'
and/or (unescaped)'\n'
characters within unbalanced unescaped quotes.
fread
cannot handle such ambiguous cases and those lines may not have been read in as expected. Please read the section on quotes in ?fread
.
Through research I've found suggestions to add quote = ""
to the code, but it doesn't help me. I've tried using the bigmemory
package, but R crashes when I try. I'm on a 64 bit system with 8 gb of ram.
I know there are quite a few threads on this subject, but I haven't been able to solve the problem with any of the solutions. I would really like to use fread
(given my good experience with the bigger file), and it seems like there should be some way to make it work - just can't figure it out.
Upvotes: 2
Views: 1234
Reputation: 335
Solved this by installing SlickEdit and using it to edit the lines that caused the trouble. A few characters like ampersand, quotation marks, and apostrophes were consistently encoded to include semicolon - e.g. &
instead of just &. As semicolon was the seperator in the text document, this caused the problem in reading with fread
.
Upvotes: 1