Johan Rosa
Johan Rosa

Reputation: 3152

download.file() download corrupt xls

I am trying to create a package to download, import and clean data from the Dominican Republic Central Bank web page. I have done all the coding in Rstudio.cloud and everything works just fine, but when I try the functions in my local machine they do not work.

After digging a bit on each function, I realized that the problem was the downloaded file, it is corrupt.

I am including the first steps of a function just to illustrate my issue.

file url

# Packages
library(readxl)

# file url. 
url <- paste0("https://cdn.bancentral.gov.do/documents/",
              "estadisticas/precios/documents/",
              "ipc_base_2010.xls?v=1570116997757")

# termporary path
file_path <- tempfile(pattern = "", fileext = ".xls")

# downloading 
download.file(url, file_path, quiet = TRUE)

# reading the file
ipc_general <- readxl::read_excel(
            file_path,
            sheet = 1,
            col_names = FALSE,
            skip = 7
        )

Error: 
  filepath: C:\Users\Johan Rosa\AppData\Local\Temp\RtmpQ1rOT3\2a74778a1a64.xls
  libxls error: Unable to open file

I am using temporary files, but that is not the problem, you can try to download the file in your working directory and the problem persist.

I want to konw:

  1. Why this code works in rstudio.clowd and not local?
  2. What can I do to get the job done? (alternative approach, packages, functions)

By the way, I am using Windows 10

Edit

Answer:

1- Rstudio.cloud runs on linux, but for Windows, I need to make some adjustments to the download.file() command.

2- download.file(url, file_path, quiet = TRUE, mode = "wb")

This is what I was looking for.

Now I have a different problem. I have to think a way to detect if the function is running on Linux or Windows, to set that argument accordingly.

I can write a new download file function using if else calls on .Platform$OS.type result.

Or, can I set mode = "wb" for all download.file() calls?

Do you have any recommendations?

Upvotes: 1

Views: 720

Answers (1)

captcoma
captcoma

Reputation: 1898

From the Documentation of download.file()

The choice of binary transfer (mode = "wb" or "ab") is important on Windows, since unlike Unix-alikes it does distinguish between text and binary files and for text transfers changes \n line endings to \r\n (aka CRLF).

Code written to download binary files must use mode = "wb" (or "ab"), but the problems incurred by a text transfer will only be seen on Windows.

From the source of download.file

head(print(download.file),12)
1  function (url, destfile, method, quiet = FALSE, mode = "w", cacheOK = TRUE,    
2      extra = getOption("download.file.extra"), headers = NULL,                  
3      ...)                                                                       
4  {                                                                              
5      destfile                                                                   
6      method <- if (missing(method))                                             
7          getOption("download.file.method", default = "auto")                    
8      else match.arg(method, c("auto", "internal", "wininet", "libcurl",         
9          "wget", "curl", "lynx"))                                               
10     if (missing(mode) && length(grep("\\\\.(gz|bz2|xz|tgz|zip|rd[as]|RData)$", 
11         URLdecode(url))))                                                      
12         mode <- "wb" 

So looking at the source, if you did not set mode, the function uses automatically "w", except, the URL contains gz,bz2,xz etc. (that is why you get the first error).

In my humble opinion I think that in Unix-alikes (e.g. Linux) "w" and "wb" are the same, because they do not differentiate between text and binary files, but Windows does.

So you could set mode="wd" for all download.file calls (as long as it is not a text transfer under Windows), this will not affect the function in Linux.

Upvotes: 2

Related Questions