Abhinav Rawat
Abhinav Rawat

Reputation: 452

data.table::fread() is crashing for bigger csv files

I am processing big csv file (~500-700MB), so I am reading them chunk by chunk. I tried read.csv() function but it is very slow as number to rows to skip increases, so I found data.table::fread() a much faster way to read a file.(R-Blogger,and stackOverflow) but when I am reading a 60MB csv file with fread() it works fine (Reading 60MB file) but when I tried it on a bigger file (~450MB) of same type it shows R Session Aborted (Reading 450MB file) both files have same structure, it only differs in size. I am not able to understand why it is not working as people are reading even bigger size file with it.

Here is my code snippet-

library(data.table)
ffName = "Bund001.csv"

s<- Sys.time()


ColNamesVector <<- c("RIC","Date","Time","GMT_Offset","Type","Price","Volume","Bid_Price","Bid_Size","Ask_Price","Ask_Size","Qualifiers")


rawData <- fread(ffName,sep=",",nrows = 100000,skip = 400000,col.names = ColNamesVector)



print(Sys.time()-s)

Upvotes: 1

Views: 2116

Answers (2)

Matt Dowle
Matt Dowle

Reputation: 59612

Did you check NEWS first? (Other tips are on the data.table Support page.)

The screenshot included in your question shows you are using 1.10.4. As luck would have it, currently NEWS shows that 14 improvements have been made to fread since then and many are relevant to your question. Please try dev. The installation page explains that a pre-compiled binary for Windows is made for you and how to get it. You don't need to have any tools installed. That page explains you can revert easily should it not work out.

Please try v1.10.5 from dev and accept this answer if that fixes it.

Upvotes: 4

Severin Pappadeux
Severin Pappadeux

Reputation: 20120

It is not about size, it means your CSV is slightly out-of-specs.

I would advice to try readr, it is a bit slower but more tolerant to errors

https://github.com/tidyverse/readr

Upvotes: -3

Related Questions