Reputation: 41
I have the URL of a file I want to download and from here directly read into R. It is an .xlsx file from Gapminder. The Webiste to find it is "https://github.com/Gapminder-Indicators/u5mr/blob/master/u5mr-by-gapminder.xlsx".
I have tried two things:
url <- "https://www.gapminder.org/data/documentation/gd005/u5mr-by-gapminder.xlsx"
download.file(url,destfile="example.xlsx")
example <- read_excel("example.xlsx")
with the error
> url <- "https://github.com/Gapminder-Indicators/u5mr/blob/master/u5mr-by-gapminder.xlsx"
> download.file(url,destfile="example.xlsx")
trying URL 'https://github.com/Gapminder-Indicators/u5mr/blob/master/u5mr-by-gapminder.xlsx'
Content type 'text/html; charset=utf-8' length unknown
downloaded 61 KB
> example <- read_excel("example.xlsx")
Error: Evaluation error: zip file 'C:\Users\user\Documents\example.xlsx' cannot be opened.
>
and
library(RCurl)
URL <- "https://www.gapminder.org/data/documentation/gd005/u5mr-by-gapminder.xlsx"
x <- getURL(URL)
out <- read.csv(textConnection(x))
head(out[1:6])
with the result
> library(RCurl)
> URL <- "https://www.gapminder.org/data/documentation/gd005/u5mr-by-gapminder.xlsx"
> x <- getURL(URL)
> out <- read.csv(textConnection(x))
Warning message:
In scan(file = file, what = what, sep = sep, quote = quote, dec = dec, :
EOF within quoted string
> head(out[1:6])
Error in `[.data.frame`(out, 1:6) : undefined columns selected
So the file is either not downloaded at all or incorrectly read in. How can I simply download that one excel-file from the webpage and read in?
Upvotes: 0
Views: 749
Reputation: 173793
I don't think just changing the url as @James says is enough, (note that the url in question forwards your browser to a different url with the actual file - in your case it's https://raw.githubusercontent.com/Gapminder-Indicators/u5mr/master/u5mr-by-gapminder.xlsx
).
But this is not the only problem. You need to write the downloaded file in binary format using mode = "wb"
before you can open it with read_excel
, also specifying the worksheet that you want to open.
Here's a working example:
library(readxl)
destfile <- path.expand("~/example.xlsx")
url <- paste0("https://raw.githubusercontent.com/Gapminder-Indicators",
"/u5mr/master/u5mr-by-gapminder.xlsx")
download.file(url, destfile = destfile, mode = "wb")
example <- read_excel(destfile, sheet = 2)
and now you have the spreadsheet as a tibble:
print(example)
#> # A tibble: 275 x 305
#> geo.name indicator.name geo indicator `1800` `1801` `1802` `1803` `1804` `1805`
#> <chr> <chr> <chr> <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
#> 1 Abkhazia Child mortali~ abkh u5mr NA NA NA NA NA NA
#> 2 Afghani~ Child mortali~ afg u5mr 469. 469. 469. 469. 469. 469.
#> 3 Akrotir~ Child mortali~ akr_~ u5mr NA NA NA NA NA NA
#> 4 Albania Child mortali~ alb u5mr 375. 375. 375. 375. 375. 375.
#> 5 Algeria Child mortali~ dza u5mr 460. 460. 460. 460. 460. 460.
#> 6 America~ Child mortali~ asm u5mr NA NA NA NA NA NA
#> 7 Andorra Child mortali~ and u5mr NA NA NA NA NA NA
#> 8 Angola Child mortali~ ago u5mr 486. 486. 486. 486. 486. 486.
#> 9 Anguilla Child mortali~ aia u5mr NA NA NA NA NA NA
#> 10 Antigua~ Child mortali~ atg u5mr 474. 470. 466. 462. 458. 455.
#> # ... with 265 more rows, and 295 more variables: `1806` <dbl>, `1807` <dbl>
Upvotes: 4
Reputation: 66834
Despite the URL, the GitHub server doesn't actually serve the file unless raw=true
is passed in as a parameter.
Use this URL instead: https://github.com/Gapminder-Indicators/u5mr/blob/master/u5mr-by-gapminder.xlsx?raw=true
Upvotes: 4