Reputation: 1042
I'm reading a file into R using fread using below methods:
fread("file:///C:/Users/Desktop/ads.csv")
fread("C:/Users/Desktop/ads.csv") # Just omitted "file:///"
I've observed the runtime to be very different:
microbenchmark(
fread("file:///C:/Users/Desktop/ads.csv"),
fread("C:/Users/Desktop/ads.csv")
)
Unit: microseconds
expr min lq mean median uq max neval cld
fread("file:///C:/Users/Desktop/ads.csv") 5755.975 6027.4735 6696.7807 6235.3365 6506.652 41257.476 100 b
fread("C:/Users/Desktop/ads.csv") 525.492 584.0215 673.7166 647.4745 727.703 1476.191 100 a
Why does the run-time vary so much? There isn't noticeable difference between 2 variants when I was using read.csv() though
Upvotes: 25
Views: 2038
Reputation: 34703
The following has been added to ?fread
:
When
input
begins with http://, https://, ftp://, ftps://, or file://,fread
detects this and downloads the target to a temporary file (attempfile()
) before proceeding to read the file as usual. Secure URLS (ftps:// and https://) are downloaded withcurl::curl_download
; ftp:// and http:// paths are downloaded withdownload.file
andmethod
set togetOption("download.file.method")
, defaulting to"auto"
; and file:// is downloaded withdownload.file
withmethod="internal"
. NB: this implies that for file://, even files found on the current machine will be "downloaded" (i.e., hard-copied) to a temporary file. See?download.file
for more details.
From the source of fread
:
if (str6 == "ftp://" || str7 == "http://" || str7 == "file://") {
method = if (str7 == "file://") "auto"
else getOption("download.file.method", default = "auto")
download.file(input, tmpFile, method = method, mode = "wb", quiet = !showProgress)
}
That is, your file is being "downloaded" to a temporary file, which should consist of deep-copying the contents of the file to a temporary location. file://
is not really intended for use on local files, but on files in a network that need to be downloaded locally before being read (IIUC; FWIW, this is what fread
's testing regime uses to imitate file download while testing on CRAN, where external file download is impossible).
I also notice that your timings are on the order of microseconds, which could explain the discrepancy vs. read.csv
. Imagine read.csv
takes 1 second to read the file, while fread
takes .01 seconds; file copying takes .05
seconds. Then in both cases read.csv
will look about the same (1 vs 1.05 seconds), while fread
looks substantially slower for the file://
case (.01 vs. .06 seconds).
Upvotes: 26