Reputation: 640
I have a url:
url <- "http://www.railroadpm.org/home/RPM/Performance%20Reports/BNSF.aspx"
that contains a link to a csv file that I would like to download. The "Export to CSV" link on the above page. The problem is that the csv file is not part of a url, but rather it's javascript. What I would like to do is access the link and create a dataframe out of the csv file. The javascript is:
javascript:__doPostBack('ctl11$btnCSV','')
and from that I can tell that the id is
"ctl11_btnCSV"
but I am unsure of how this fits into RCUrl, which from SO seems to be the best way to access this data. Any help would be appreciated.
Thanks.
Upvotes: 0
Views: 237
Reputation: 78792
There was zero effort put into this question (esp since the OP came to the conclusion that RCurl
is the current best practice for web wrangling in R) but anytime an SO web scraping question that involves a SharePoint site can actually be answered (Microsoft SharePoint is one of the worst things invented ever next to Windows) it's worth posting an answer.
library(rvest)
library(httr)
# make an initial connection to get cookies
httr::GET(
"http://www.railroadpm.org/home/RPM/Performance%20Reports/BNSF.aspx"
) -> res
# retrieve some hidden bits we need to pass b/c SharePoint is a wretched thing.
pg <- content(res, as = "parsed")
for_post <- html_nodes(pg, "input[type='hidden']")
# post the hidden form & save out the CSV
httr::POST(
"http://www.railroadpm.org/home/RPM/Performance%20Reports/BNSF.aspx",
body = as.list(
c(
setNames(
html_attr(for_post, "value"),
html_attr(for_post, "id")
),
`__EVENTTARGET` = "ctl11$btnCSV"
)
),
write_disk("meaures.csv"),
progress()
) -> res
Upvotes: 2