Reading large csv file in R

Question

I have a number of csv-files of different size, but all somewhat big. Using read.csv to read them into R takes longer than I've been patient to wait so far (several hours). I managed to read the biggest file (2.6 gb) very fast (less than a minute) with data.table's fread.

My problem occurs when I try to read a file of half the size. I get the following error message:

Error in fread("C:/Users/Jesper/OneDrive/UdbudsVagten/BBR/CO11700T.csv",:

Expecting 21 cols, but line 2557 contains text after processing all cols. It is very likely that this is due to one or more fields having embedded sep=';' and/or (unescaped) ' ' characters within unbalanced unescaped quotes.

fread cannot handle such ambiguous cases and those lines may not have been read in as expected. Please read the section on quotes in ?fread.

Through research I've found suggestions to add quote = "" to the code, but it doesn't help me. I've tried using the bigmemory package, but R crashes when I try. I'm on a 64 bit system with 8 gb of ram.

I know there are quite a few threads on this subject, but I haven't been able to solve the problem with any of the solutions. I would really like to use fread (given my good experience with the bigger file), and it seems like there should be some way to make it work - just can't figure it out.

Morten Nielsen · Accepted Answer

Solved this by installing SlickEdit and using it to edit the lines that caused the trouble. A few characters like ampersand, quotation marks, and apostrophes were consistently encoded to include semicolon - e.g. & instead of just &. As semicolon was the seperator in the text document, this caused the problem in reading with fread.

Reading large csv file in R

Answers (1)

Related Questions