Reputation: 7895
I'm trying to open a 2.2G file using fread
from the data.table
package, but keeping getting the same error (it works for other files, which are less 1G tough):
library(data.table)
data.table 1.9.4 For help type: ?data.table
*** NB: by=.EACHI is now explicit. See README to restore previous behaviour.
train = data.table::fread('train.csv')
Error in data.table::fread("train.csv") :
file not found: train.csv
Of course, the file is present (read.csv()
works, but is really slow).
I'm running Ubuntu 12.04 LTS, on a i686. Appreciate any help!
NOTE: The file i'm, trying to read is 'train.gz', which can be found at: https://www.kaggle.com/c/tradeshift-text-classification/data.
It's a 2.2G csv file, pretty standard.
EDIT: When i use verbose=TRUE
, it says:
Input contains no \n. Taking this to be a filename to open
Upvotes: 3
Views: 3737
Reputation: 301
To open large files on a 32-bit Linux systems, one needs to supply an O_LARGEFILE
option to the open
function, which fread
doesn't do. It's an open
call that actually fails, but it's erroneously reported as a "file not found" error.
Another way to enable large file support is to pass a -D_FILE_OFFSET_BITS=64
option to the compiler while installing the package. Unload and remove data.table
, put the following into ~/.R/Makevars
:
CFLAGS=-D_FILE_OFFSET_BITS=64
and then issue R CMD INSTALL /path/to/data.table_X.Y.Z.tar.gz
. The newly installed package will successfully open large files on a 32-bit system.
Upvotes: 3
Reputation: 7895
Well just to close the topic: i upgrade my Ubuntu to x86-64, now fread
works fine.
Just a summary to help the developers:
1-Downloaded a huge file (2.2G in this case)
2-Try to read with fread
, and gets the error: file not found: train.csv
I was using Ubuntu 12.04 LTS x86 and R last stable version.
As pointed out smaller files worked in this scenario (~731 MB). Thanks for the help anyway!
Upvotes: 2