Extracting .csv files from a website

Question

I want to extract some data from a website I subscribe to. I can extract the csv files manually but there is a file for each day and I want 5 years worth of data so it will take too long.

I have used rvest to log into the site but to download the data I need to manually click a button, how can I do this within R?

When I do it manually the file goes into my download folder which is totally fine since I can do a mass import, but equally if I can just load the data straight into R that would be a little easier.

Here is what I have so far:

library(rvest)

url       <-"http://www.horseracebase.com/index.php/"
pgsession <-html_session(url)               ## create session
pgform    <-html_form(pgsession)[[1]]       ## pull form from session

filled_form <- set_values(pgform,
                      `login` = "xxx", 
                      `password` = "yyy")

submit_form(pgsession,filled_form)

This gets me logged in (I think) but now I don't know how to extract the data?

I do the same thing on Betfair where I use something like:

df <- read.csv("http://www.someurl.com/betfairdata.csv")

This works fine but all their files are listed on the actual page so no clicking of buttons required.

Is there any way to interact with the button using rvest or is there a way of finding the correct URL so I can just use read.csv as above?

Thanks

Extracting .csv files from a website

Answers (1)

Related Questions