jalapic
jalapic

Reputation: 14192

Error when trying to read excel file from web site

I'm trying to download the xlsx file that is available at the following url. If you go to the website and click the link, it will download as a file on your computer. However, I want to automate this process. I have tried the following:

library(RCurl)
download.file("https://dshs.texas.gov/coronavirus/TexasCOVID19DailyCountyCaseCountData.xlsx", "temp.xlsx")
library(readxl)
tmp <- read_xlsx("temp.xlsx")
# Error: Evaluation error: error reading from the connection.

This method does download a temp.xlsx file to my drive. However, if you try and manually click on it to open, excel fails to open it. It knows it's size, but is unable to open.
.

readxl::read_xlsx("https://dshs.texas.gov/coronavirus/TexasCOVID19DailyCountyCaseCountData.xlsx")
    # Error: `path` does not exist: ‘https://dshs.texas.gov/coronavirus/TexasCOVID19DailyCountyCaseCountData.xlsx’

Both of these methods are my go-to for downloading excel files from websites. Is there some specific reason why these methods don't work here?

Upvotes: 2

Views: 164

Answers (1)

lroha
lroha

Reputation: 34291

When downloading certain file formats on Windows you need to specify that it should be a binary rather than the (usual) default of a text transfer - from the download.file() documentation:

The choice of binary transfer (mode = "wb" or "ab") is important on Windows, since unlike Unix-alikes it does distinguish between text and binary files and for text transfers changes \n line endings to \r\n (aka ‘CRLF’).

On Windows, if mode is not supplied (missing()) and url ends in one of .gz, .bz2, .xz, .tgz, .zip, .rda, .rds or .RData, mode = "wb" is set such that a binary transfer is done to help unwary users.

Code written to download binary files must use mode = "wb" (or "ab"), but the problems incurred by a text transfer will only be seen on Windows.

In this case so that the file is written correctly use:

download.file("https://dshs.texas.gov/coronavirus/TexasCOVID19DailyCountyCaseCountData.xlsx",
              "temp.xlsx", mode = "wb")

Upvotes: 3

Related Questions