data.table::fread() is crashing for bigger csv files

Question

I am processing big csv file (~500-700MB), so I am reading them chunk by chunk. I tried read.csv() function but it is very slow as number to rows to skip increases, so I found data.table::fread() a much faster way to read a file.(R-Blogger,and stackOverflow) but when I am reading a 60MB csv file with fread() it works fine but when I tried it on a bigger file (~450MB) of same type it shows R Session Aborted both files have same structure, it only differs in size. I am not able to understand why it is not working as people are reading even bigger size file with it.

Here is my code snippet-

library(data.table)
ffName = "Bund001.csv"

s<- Sys.time()


ColNamesVector <<- c("RIC","Date","Time","GMT_Offset","Type","Price","Volume","Bid_Price","Bid_Size","Ask_Price","Ask_Size","Qualifiers")


rawData <- fread(ffName,sep=",",nrows = 100000,skip = 400000,col.names = ColNamesVector)



print(Sys.time()-s)

Matt Dowle · Accepted Answer

Did you check NEWS first? (Other tips are on the data.table Support page.)

The screenshot included in your question shows you are using 1.10.4. As luck would have it, currently NEWS shows that 14 improvements have been made to fread since then and many are relevant to your question. Please try dev. The installation page explains that a pre-compiled binary for Windows is made for you and how to get it. You don't need to have any tools installed. That page explains you can revert easily should it not work out.

Please try v1.10.5 from dev and accept this answer if that fixes it.

data.table::fread() is crashing for bigger csv files

Answers (2)

Related Questions