Reputation: 578
I am trying to use the Selenium package in R to scrape the following page: http://www.wbsec.gov.in/(S(njkinc55hbv2hw55xksxdv45))/DetailedResult/Detailed_gp.aspx. I am interested in all combinations of the drop-downs selected but keep getting the
Couldnt connect to host on http://localhost:4444/wd/hub.Please ensure a Selenium server is running.
Error in queryRD(paste0(serverURL, "/session"), "POST", qdata = toJSON(serverOpts)) :
library(RSelenium)
library(XML)
library(magrittr)
checkForServer()
startServer()
remDrv<-remoteDriver()
remDrv$open()
remDrv$navigate("http://www.wbsec.gov.in/(S(njkinc55hbv2hw55xksxdv45))/DetailedResult/Detailed_gp.aspx")
Any help would be appreciated.
Upvotes: 1
Views: 1875
Reputation: 93
You don't seem to have set up Selenium properly. Make sure you have Selenium downloaded and R Selenium loaded in R. This link might be helpful.
Once Selenium is set up properly, all you have to do is find the css selectors (selectorgadget is a great tool for this), and send the required information to the dropdowns, scrape the website and repeat. I would do three dropdowns.
Upvotes: 1
Reputation: 78842
Use an intermediary such as burpsuite to capture what's going on and use the results in combination with rvest
's html_session
and/or httr
's POST
.
In this case, you'd see your original URL contains the initial <select>
menu and you'd also see that selecting one issues a POST
to:
http://www.wbsec.gov.in/(S(njkinc55hbv2hw55xksxdv45))/DetailedResult/Detailed_gp.aspx
with a number of the hidden variables in the original form element as well as ddldistrict
, ddlblock
and ddlgp
. The response contains the subsequent <select>
menu options.
Use rvest
to get the value
attribute of each dropdown and make subsequent POST
s to the Detailed_gp.aspx
URL until you've got all the combinations.
You'll probably get a Selenium answer, but this problem only requires posting to forms, which is something httr
and rvest
excel at.
Upvotes: 2