user3271783
user3271783

Reputation: 113

Download file with R given a JavaScript Statement

I want to create an R script that, among other things, downloads baseball player projection data from http://www.fangraphs.com/projections.aspx?pos=all&stats=bat&type=zips. There is a link to export this data to .csv on the webpage near the top right corner of the data table but it appears to be a JavaScript command (javascript:__doPostBack('ProjectionBoard1$cmdCSV',''). I am familiar with using download.file() using a link to a .csv file but am not sure how to approach this.

How can I use R to extract this data?

Upvotes: 6

Views: 2223

Answers (2)

Davi Moreira
Davi Moreira

Reputation: 953

I had a similar problem trying to download several .pdf files. The solution I found is the following:

[1]. Get all .pdf links, like this one:

link <- "http://www.biblioteca.presidencia.gov.br/presidencia/ex-presidentes/luiz-inacio-lula-da-silva/discursos/1o-mandato/2003/01-01-pronun-do-presidente-da-republica-luiz-inacio-lula-da-silva-na-sessao-solene-de-posse-no-cn.pdf" 

[2] Instead of using download.file() function, use browseURL(), like this:

browseURL(link, browser = getOption("browser"),
        encodeIfNeeded = FALSE)

[3] browseURL() function makes your browser open the file and it can automatically save the .pdf in your computer's download directory. If you are using Google Chrome, you can follow this steps:

https://www.computerhope.com/issues/ch001114.htm

Upvotes: 0

Spacedman
Spacedman

Reputation: 94192

The donwload isn't a simple response that can be easily retrieved with download.file. The web page constructs a FORM with some huge parameters that store the state of the web page, then pass this (and a load of cookies too) to the server to get the CSV response.

To make this work in R (or any other programming language) you need to construct that response, which you can usually only do by first getting the web page, scraping the FORM parameters (and cookies), then constructing the precise POST request you did when you clicked on the link.

This might be possible with RCurl, and it can sometimes be easier if you have a browser that can save the POST request parameter from its developer tools so you can then get RCurl to read them.

Another common technique in web scraping is to essentially run a browser that can be automated by a scripting language. There's an R package that leverages Selenium that might be able to do this:

http://cran.r-project.org/web/packages/RSelenium/index.html

There are some related (but not duplicate) Q's here, such as:

How to use R to download a zipped file from a SSL page that requires cookies

An R-help posting from a couple of years ago has some suggestions too:

https://stat.ethz.ch/pipermail/r-help//2012-September/335769.html

Upvotes: 1

Related Questions