LWRMS
LWRMS

Reputation: 597

Rselenium - How to scrape all drop down list option values

How can all the option values from a drop down list be scraped using Rselenium?

Sample of page source:

<select name="main$ddArea" onchange="javascript:setTimeout(&#39;__doPostBack(\&#39;main$ddArea\&#39;,\&#39;\&#39;)&#39;, 0)" id="main_ddArea" class="groupTextBox">
<option selected="selected" value="95182">Area 1</option>
<option value="95183">Area 2</option>
<option value="95184">Area 3</option>
<option value="95185">Area 4</option>
<option value="95186">Area 4</option>
</select>

The result wanted is a vector with each value as an element. For example, values = c("95182", "95183", "95184", "95185", "95186")

Obtaining a string of the values would also likely work as it could be split into elements, e.g., using strsplit.

getElementAttribute() with 'value' or 'option' does not work. E.g.,

dd.areas = remDr$findElement(using='id', value="main_ddArea")
dd.areas$getElementAttribute('option')

or

dd.areas$getElementAttribute('value')

getElementText() finds one string of the text, e.g, "Area 1 /n Area 2 /n Area 3 /n...." . But the text can't later be used to navigate the drop down list. In other words, when navigating the dropdown list using $findelement(), a value is needed to populate the drop down list; text does not work.

The package documentation does not appear to contain references to drop down lists and neither does the vignette.

Upvotes: 3

Views: 1720

Answers (1)

jdharrison
jdharrison

Reputation: 30425

You can use findElement to target the select tag then get the outerHTML and parse the resulting html:

remDr$navigate("https://www.tutorialspoint.com/html/html_select_tag.htm")
webElem <- remDr$findElement("name", "dropdown")
appHTML <- webElem$getElementAttribute("outerHTML")[[1]]
doc <- htmlParse(appHTML)
doc["//option", fun = function(x) xmlGetAttr(x, "value")]

> doc["//option", fun = function(x) xmlGetAttr(x, "value")]
[[1]]
[1] "Data Structures"

[[2]]
[1] "Data Mining"

There were some recent issues with Firefox and get element attributes which appear when running selenium server 2 with a gecko based browser see GetAttribute of WebElement in Selenium Firefox Driver Returns Empty . In such a case you can use JavaScript to get the attributes

remDr$navigate("https://www.tutorialspoint.com/html/html_select_tag.htm")
webElem <- remDr$findElement("name", "dropdown")
jsScript <- "var element = arguments[0]; return element.outerHTML;"
appHTML <- remDr$executeScript(jsScript, list(webElem))[[1]]
doc <- htmlParse(appHTML)
doc["//option", fun = function(x) xmlGetAttr(x, "value")]

> doc["//option", fun = function(x) xmlGetAttr(x, "value")]
[[1]]
[1] "Data Structures"

[[2]]
[1] "Data Mining"

Upvotes: 5

Related Questions