Jonno Bourne
Jonno Bourne

Reputation: 1981

R produces "unsupported URL scheme" error when getting data from https sites

R version 3.0.1 (2013-05-16) for Windows 8 knitr version 1.5 Rstudio 0.97.551

I am using knitr to do the markdown of my R code. As part of my analysis I downloaded various data sets from the web, knitr is totally fine with getting data from http sites but from https ones where it generates an unsupported URL scheme message. I know when using the download.file function on a mac the method parameter has to be set to curl to get data from an https however this doesn't help when using knitr.

What do I need to do so that knitr will gather data from Https websites?

Edit: Here is the code chunk that returns an error in Knitr but when run through R works without error.

```{r}
fileurl <- "https://dl.dropbox.com/u/7710864/data/csv_hid/ss06hid.csv"
download.file(fileurl, destfile = "C:/Users/xxx/yyy")
```

Upvotes: 11

Views: 25992

Answers (7)

Thomas
Thomas

Reputation: 44525

Edit (May 2016): As of R 3.3.0, download.file() should handle SSL websites automatically on all platforms, making the rest of this answer moot.

You want something like this:

library(RCurl)
data <- getURL("https://dl.dropbox.com/u/7710864/data/csv_hid/ss06hid.csv",
               ssl.verifypeer=0L, followlocation=1L)

That reads the data into memory as a single string. You'll still have to parse it into a dataset in some way. One strategy is:

writeLines(data,'temp.csv')
read.csv('temp.csv')

You can also separate out the data directly without writing to file:

read.csv(text=data)

Edit: A much easier option is actually to use the rio package:

library("rio")
import("https://dl.dropbox.com/u/7710864/data/csv_hid/ss06hid.csv")

This will read directly from the HTTPS URL and return a data.frame.

Upvotes: 9

Michael Szczepaniak
Michael Szczepaniak

Reputation: 2100

Using the R download package takes care of the quirky details typically associated with file downloads. For you example, all you needed to do would have been:

```{r}
library(download)
fileurl <- "https://dl.dropbox.com/u/7710864/data/csv_hid/ss06hid.csv"
download(fileurl, destfile = "C:/Users/xxx/yyy")
```

Upvotes: 1

Renhuai
Renhuai

Reputation: 576

Use setInternet2(use = TRUE) before using the download.file() function. It works on Windows 7.

setInternet2(use = TRUE)
download.file(url, destfile = "test.csv")

Upvotes: 9

user3694373
user3694373

Reputation: 140

I am sure you have already found solution to your problem by now.

I was working on an assignment right now and ended up getting the same error. I tried some of the tricks, but that did not work for me. Maybe because I am working on Windows machine.

Anyhow, I changed the link to http: rather than https: and that did the trick.

Following is chunk of my code:

if (!file.exists("./PeerAssesment2")) {dir.create("./PeerAssessment2")}
fileURL <- "http://d396qusza40orc.cloudfront.net/repdata%2Fdata%2FStormData.csv.bz2"
download.file(fileURL, dest = "./PeerAssessment2/Data.zip")

install.packages("R.utils")
library(R.utils)
if (!file.exists("./PeerAssessment2/Data")) {
    bunzip2 ("./PeerAssessment2/Data.zip", destname = "./PeerAssessment2/Data")
}
list.files("./PeerAssessment2")

noaaData <- read.csv ('./PeerAssessment2/Data')

Hope this helps.

Upvotes: 5

user2500444
user2500444

Reputation: 111

I had the same problem with a https with the following code running perfectly in R and getting unsupported URL scheme when knitting to html:

temp = tempfile()
download.file("https://d396qusza40orc.cloudfront.net/repdata%2Fdata%2Factivity.zip", temp)
data = read.csv(unz(temp, "activity.csv"), colClasses = c("numeric", "Date", "numeric"))

I tried all the solutions posted here and nothing worked, in my absolute desperation I just eliminated the "s" in the "https" in the url and everything got fine...

Upvotes: 1

Fabien Barbier
Fabien Barbier

Reputation: 1524

You could use https with download.file() function by passing "curl" to method as :

download.file(url,destination,method="curl")

Upvotes: 21

ndou
ndou

Reputation: 1178

I had the same issue with knitr and download.file() with a https url, on Windows 8.

You could try setInternet2(TRUE) before using the download.file() function. However I'm not sure that this fix works on Unix-like systems.

setInternet2(TRUE)  # set the R_WIN_INTERNET2 to TRUE
fileurl <- "https://dl.dropbox.com/u/7710864/data/csv_hid/ss06hid.csv"
download.file(fileurl, destfile = "C:/Users/xxx/yyy") # now it should work

Source : R documentation (?download.file()) :

Note that https:// URLs are only supported if --internet2 or environment variable R_WIN_INTERNET2 was set or setInternet2(TRUE) was used (to make use of Internet Explorer internals), and then only if the certificate is considered to be valid.

Upvotes: 4

Related Questions