Reputation: 597
How can all the option values from a drop down list be scraped using Rselenium?
Sample of page source:
<select name="main$ddArea" onchange="javascript:setTimeout('__doPostBack(\'main$ddArea\',\'\')', 0)" id="main_ddArea" class="groupTextBox">
<option selected="selected" value="95182">Area 1</option>
<option value="95183">Area 2</option>
<option value="95184">Area 3</option>
<option value="95185">Area 4</option>
<option value="95186">Area 4</option>
</select>
The result wanted is a vector with each value as an element. For example, values = c("95182", "95183", "95184", "95185", "95186")
Obtaining a string of the values would also likely work as it could be split into elements, e.g., using strsplit.
getElementAttribute() with 'value' or 'option' does not work. E.g.,
dd.areas = remDr$findElement(using='id', value="main_ddArea")
dd.areas$getElementAttribute('option')
or
dd.areas$getElementAttribute('value')
getElementText()
finds one string of the text, e.g, "Area 1 /n Area 2 /n Area 3 /n...." . But the text can't later be used to navigate the drop down list. In other words, when navigating the dropdown list using $findelement()
, a value is needed to populate the drop down list; text does not work.
The package documentation does not appear to contain references to drop down lists and neither does the vignette.
Upvotes: 3
Views: 1720
Reputation: 30425
You can use findElement
to target the select
tag then get the outerHTML
and parse the resulting html:
remDr$navigate("https://www.tutorialspoint.com/html/html_select_tag.htm")
webElem <- remDr$findElement("name", "dropdown")
appHTML <- webElem$getElementAttribute("outerHTML")[[1]]
doc <- htmlParse(appHTML)
doc["//option", fun = function(x) xmlGetAttr(x, "value")]
> doc["//option", fun = function(x) xmlGetAttr(x, "value")]
[[1]]
[1] "Data Structures"
[[2]]
[1] "Data Mining"
There were some recent issues with Firefox and get element attributes which appear when running selenium server 2 with a gecko based browser see GetAttribute of WebElement in Selenium Firefox Driver Returns Empty . In such a case you can use JavaScript to get the attributes
remDr$navigate("https://www.tutorialspoint.com/html/html_select_tag.htm")
webElem <- remDr$findElement("name", "dropdown")
jsScript <- "var element = arguments[0]; return element.outerHTML;"
appHTML <- remDr$executeScript(jsScript, list(webElem))[[1]]
doc <- htmlParse(appHTML)
doc["//option", fun = function(x) xmlGetAttr(x, "value")]
> doc["//option", fun = function(x) xmlGetAttr(x, "value")]
[[1]]
[1] "Data Structures"
[[2]]
[1] "Data Mining"
Upvotes: 5