Reputation: 111
I want to download a file by R code which is downloaded by clicking on "download" button on this site: https://ivo.gascade.biz/ivo/capacities?9
Clicking "download" runs GET method https://ivo.gascade.biz/ivo/capacities?reportparameterselect_hf_0=&9-2.IFormSubmitListener-form=&netpoint=6800&flowDirection=EXIT&from=08%2F05%2F2019&to=06%2F05%2F2021&fileType=1&download=Download
but when I use:
url <- "https://ivo.gascade.biz/ivo/capacities?reportparameterselect_hf_0=&9-2.IFormSubmitListener-form=&netpoint=6800&flowDirection=EXIT&from=08%2F05%2F2019&to=06%2F05%2F2021&fileType=1&download=Download"
download.file(url, dest.file="myfile.csv")
then I download only html thrash. Any suggestions how to get a file with R code?
What is strange that when this returns ""
RCurl::getURL("https://ivo.gascade.biz/ivo/capacities?9")
Upvotes: 0
Views: 61
Reputation:
They expect a cookie associated with a live session. The request URLs also appear to be different for each request even if the requested data are the same, but the cookies remain the same. If you have a live session in your browser, you can find the JSESSIONID cookies and current request URL under the request headers in the network tab. Pass them in to the header argument as a named vector:
cookie <- "JSESSIONID=5BD17…; JSESSIONID=57D9…"
download.file(url, "myfile.csv", headers = c("Cookie" = cookie))
However, this only seems to work while the page of interest is open in a browser and you've already filled out the form and clicked download, which obviously isn't very practical. I think your best bet in this case is to use a webdriver like RSelenium, which allows you to simulate browser activity programmatically.
There might also be way to create a more persistent connection using httr and adding some more header parameters (e.g. keepalive). But I suspect RSelenium might be the better choice here.
Upvotes: 1