monarque13
monarque13

Reputation: 578

Scraping page with drop-down menus in R

I am trying to use the Selenium package in R to scrape the following page: http://www.wbsec.gov.in/(S(njkinc55hbv2hw55xksxdv45))/DetailedResult/Detailed_gp.aspx. I am interested in all combinations of the drop-downs selected but keep getting the

Couldnt connect to host on http://localhost:4444/wd/hub.Please ensure a Selenium server is running.
Error in queryRD(paste0(serverURL, "/session"), "POST", qdata = toJSON(serverOpts)) :

 library(RSelenium)
 library(XML)
 library(magrittr)

 checkForServer()
 startServer()
 remDrv<-remoteDriver()
 remDrv$open()
 remDrv$navigate("http://www.wbsec.gov.in/(S(njkinc55hbv2hw55xksxdv45))/DetailedResult/Detailed_gp.aspx")

Any help would be appreciated.

Upvotes: 1

Views: 1875

Answers (2)

Boaz Sobrado
Boaz Sobrado

Reputation: 93

You don't seem to have set up Selenium properly. Make sure you have Selenium downloaded and R Selenium loaded in R. This link might be helpful.

Once Selenium is set up properly, all you have to do is find the css selectors (selectorgadget is a great tool for this), and send the required information to the dropdowns, scrape the website and repeat. I would do three dropdowns.

Upvotes: 1

hrbrmstr
hrbrmstr

Reputation: 78842

Use an intermediary such as burpsuite to capture what's going on and use the results in combination with rvest's html_session and/or httr's POST.

In this case, you'd see your original URL contains the initial <select> menu and you'd also see that selecting one issues a POST to:

http://www.wbsec.gov.in/(S(njkinc55hbv2hw55xksxdv45))/DetailedResult/Detailed_gp.aspx

with a number of the hidden variables in the original form element as well as ddldistrict, ddlblock and ddlgp. The response contains the subsequent <select> menu options.

Use rvest to get the value attribute of each dropdown and make subsequent POSTs to the Detailed_gp.aspx URL until you've got all the combinations.

You'll probably get a Selenium answer, but this problem only requires posting to forms, which is something httr and rvest excel at.

Upvotes: 2

Related Questions