Tahnoon Pasha
Tahnoon Pasha

Reputation: 6018

Download ASPX page with R

There are a number of fairly detailed answers on SO which cover authenticated login to an aspx site and a download from it. As a complete n00b I haven't been able to find a simple explanation of how to get data from a web form

The following MWE is intended as an example only. And this question is more intended to teach me how to do it for a wider collection of webpages.

website :

http://data.un.org/Data.aspx?d=SNA&f=group_code%3a101

what I tried and (obviously) failed.

test=read.csv('http://data.un.org/Handlers/DownloadHandler.ashx?DataFilter=group_code:101;country_code:826&DataMartId=SNA&Format=csv&c=2,3,4,6,7,8,9,10,11,12,13&s=_cr_engNameOrderBy:asc,fiscal_year:desc,_grIt_code:asc')

giving me goobledegook with a View(test)

Anything that steps me through this or points me in the right direction would be very gratefully received.

Upvotes: 0

Views: 2204

Answers (2)

user1609452
user1609452

Reputation: 4444

The URL you are accessing using read.csv is returning a zipped file. You could download it using httr say and write the contents to a temp file:

 library(httr)
 urlUN <- "http://data.un.org/Handlers/DownloadHandler.ashx?DataFilter=group_code:101;country_code:826&DataMartId=SNA&Format=csv&c=2,3,4,6,7,8,9,10,11,12,13&s=_cr_engNameOrderBy:asc,fiscal_year:desc,_grIt_code:asc"
 response <- GET(urlUN)
 writeBin(content(response, as = "raw"), "temp/temp.zip")
 fName <- unzip("temp/temp.zip", list = TRUE)$Name
 unzip("temp/temp.zip", exdir = "temp")
 read.csv(paste0("temp/", fName))

Alternatively Hmisc has a useful getZip function:

 library(Hmisc)
 urlUN <- "http://data.un.org/Handlers/DownloadHandler.ashx?DataFilter=group_code:101;country_code:826&DataMartId=SNA&Format=csv&c=2,3,4,6,7,8,9,10,11,12,13&s=_cr_engNameOrderBy:asc,fiscal_year:desc,_grIt_code:asc"
 unData <- read.csv(getZip(urlUN))

Upvotes: 2

Adam Hyland
Adam Hyland

Reputation: 1057

The links are being generated dynamically. The other problem is the content isn't actually at that link. You're making a request to a (very odd and poorly documented) API which will eventually return with the zip file. If you look in the Chrome dev tools as you click on that link you'll see the message and response headers.

There's a few ways you can solve this. If you know some javascript you can script a headless webkit instance like Phantom to load up these pages, simulate lick events and wait for a content response, then pipe that to something.

Alternately you may be able to finagle httr into treating this like a proper restful API. I have no idea if that's even remotely possible. :)

Upvotes: 1

Related Questions