Reputation: 6018
There are a number of fairly detailed answers on SO which cover authenticated login to an aspx site and a download from it. As a complete n00b I haven't been able to find a simple explanation of how to get data from a web form
The following MWE is intended as an example only. And this question is more intended to teach me how to do it for a wider collection of webpages.
website :
http://data.un.org/Data.aspx?d=SNA&f=group_code%3a101
what I tried and (obviously) failed.
test=read.csv('http://data.un.org/Handlers/DownloadHandler.ashx?DataFilter=group_code:101;country_code:826&DataMartId=SNA&Format=csv&c=2,3,4,6,7,8,9,10,11,12,13&s=_cr_engNameOrderBy:asc,fiscal_year:desc,_grIt_code:asc')
giving me goobledegook with a View(test)
Anything that steps me through this or points me in the right direction would be very gratefully received.
Upvotes: 0
Views: 2204
Reputation: 4444
The URL you are accessing using read.csv is returning a zipped file. You could download it
using httr
say and write the contents to a temp file:
library(httr)
urlUN <- "http://data.un.org/Handlers/DownloadHandler.ashx?DataFilter=group_code:101;country_code:826&DataMartId=SNA&Format=csv&c=2,3,4,6,7,8,9,10,11,12,13&s=_cr_engNameOrderBy:asc,fiscal_year:desc,_grIt_code:asc"
response <- GET(urlUN)
writeBin(content(response, as = "raw"), "temp/temp.zip")
fName <- unzip("temp/temp.zip", list = TRUE)$Name
unzip("temp/temp.zip", exdir = "temp")
read.csv(paste0("temp/", fName))
Alternatively Hmisc
has a useful getZip
function:
library(Hmisc)
urlUN <- "http://data.un.org/Handlers/DownloadHandler.ashx?DataFilter=group_code:101;country_code:826&DataMartId=SNA&Format=csv&c=2,3,4,6,7,8,9,10,11,12,13&s=_cr_engNameOrderBy:asc,fiscal_year:desc,_grIt_code:asc"
unData <- read.csv(getZip(urlUN))
Upvotes: 2
Reputation: 1057
The links are being generated dynamically. The other problem is the content isn't actually at that link. You're making a request to a (very odd and poorly documented) API which will eventually return with the zip file. If you look in the Chrome dev tools as you click on that link you'll see the message and response headers.
There's a few ways you can solve this. If you know some javascript you can script a headless webkit instance like Phantom to load up these pages, simulate lick events and wait for a content response, then pipe that to something.
Alternately you may be able to finagle httr into treating this like a proper restful API. I have no idea if that's even remotely possible. :)
Upvotes: 1