Fernando
Fernando

Reputation: 7895

fread: file not found

I'm trying to open a 2.2G file using fread from the data.table package, but keeping getting the same error (it works for other files, which are less 1G tough):

library(data.table)
data.table 1.9.4  For help type: ?data.table
*** NB: by=.EACHI is now explicit. See README to restore previous behaviour.

train  = data.table::fread('train.csv')

Error in data.table::fread("train.csv") : file not found: train.csv

Of course, the file is present (read.csv() works, but is really slow). I'm running Ubuntu 12.04 LTS, on a i686. Appreciate any help!

NOTE: The file i'm, trying to read is 'train.gz', which can be found at: https://www.kaggle.com/c/tradeshift-text-classification/data.

It's a 2.2G csv file, pretty standard.

EDIT: When i use verbose=TRUE, it says:

Input contains no \n. Taking this to be a filename to open

Upvotes: 3

Views: 3737

Answers (2)

Dimitri
Dimitri

Reputation: 301

To open large files on a 32-bit Linux systems, one needs to supply an O_LARGEFILE option to the open function, which fread doesn't do. It's an open call that actually fails, but it's erroneously reported as a "file not found" error.

Another way to enable large file support is to pass a -D_FILE_OFFSET_BITS=64 option to the compiler while installing the package. Unload and remove data.table, put the following into ~/.R/Makevars:

CFLAGS=-D_FILE_OFFSET_BITS=64

and then issue R CMD INSTALL /path/to/data.table_X.Y.Z.tar.gz. The newly installed package will successfully open large files on a 32-bit system.

Upvotes: 3

Fernando
Fernando

Reputation: 7895

Well just to close the topic: i upgrade my Ubuntu to x86-64, now fread works fine.

Just a summary to help the developers:

1-Downloaded a huge file (2.2G in this case)

2-Try to read with fread, and gets the error: file not found: train.csv

I was using Ubuntu 12.04 LTS x86 and R last stable version.

As pointed out smaller files worked in this scenario (~731 MB). Thanks for the help anyway!

Upvotes: 2

Related Questions