Reputation: 452
I am processing big csv file (~500-700MB), so I am reading them chunk by chunk. I tried read.csv() function but it is very slow as number to rows to skip increases, so I found data.table::fread()
a much faster way to read a file.(R-Blogger,and stackOverflow) but when I am reading a 60MB csv
file with fread() it works fine but when I tried it on a bigger file (~450MB) of same type it shows R Session Aborted
both files have same structure, it only differs in size. I am not able to understand why it is not working as people are reading even bigger size file with it.
Here is my code snippet-
library(data.table)
ffName = "Bund001.csv"
s<- Sys.time()
ColNamesVector <<- c("RIC","Date","Time","GMT_Offset","Type","Price","Volume","Bid_Price","Bid_Size","Ask_Price","Ask_Size","Qualifiers")
rawData <- fread(ffName,sep=",",nrows = 100000,skip = 400000,col.names = ColNamesVector)
print(Sys.time()-s)
Upvotes: 1
Views: 2116
Reputation: 59612
Did you check NEWS first? (Other tips are on the data.table Support page.)
The screenshot included in your question shows you are using 1.10.4. As luck would have it, currently NEWS shows that 14 improvements have been made to fread
since then and many are relevant to your question. Please try dev. The installation page explains that a pre-compiled binary for Windows is made for you and how to get it. You don't need to have any tools installed. That page explains you can revert easily should it not work out.
Please try v1.10.5 from dev and accept this answer if that fixes it.
Upvotes: 4
Reputation: 20120
It is not about size, it means your CSV is slightly out-of-specs.
I would advice to try readr
, it is a bit slower but more tolerant to errors
https://github.com/tidyverse/readr
Upvotes: -3