Reputation: 4278
I have the following HTML code :
<ul class="list" role="listbox" id="list1">
<li class="lvl2">
<div class="lvl3" id="lvl3-nb-1">
choice1
</div>
</li>
<li class="lvl2">
<div class="lvl3" id="lvl3-nb-2">
choice2
</div>
</li>
<li class="lvl2">
<div class="lvl3" id="lvl3-nb-3">
choice3
</div>
</li>
</ul>
I'd like to get the HTML (outer HTML, HTML + element, selector, Xpath, it doesn't matter) of the element which is == "choice2"
How can I do that with RSelenium
?
Thanks
EDIT for clarification : the id
s of the elements of the list are dynamic (and basically random), so the solution I need cannot be referring to their HTML or CSS. However, I know for sure the value of choice1
, choice2
and choice3
(and basically everything else, I know the classes will be called list
, lvl2
and lvl3
for instance).
Attempt at a reproducible example :
HTML :
<ul class="list" id="list1">
<li class="lvl2">
<div class="lvl3" id="n123">
paul
</div>
</li>
<li class="lvl2">
<div class="lvl3" id="n471">
john
</div>
</li>
<li class="lvl2">
<div class="lvl3" id="n951">
ringo
</div>
</li>
</ul>
R :
> library(RSelenium)
> startServer()
> mybrowser <- remoteDriver()
> mybrowser$open()
> mybrowser$navigate("http://example.com")
> list_of_beatles <- mybrowser$findElement(using = 'css selector', "ul#list.list1")
> print(unlist(strsplit(as.character(list_of_beatles$getElementText()), "\n")))
[1] "paul" "john"
[3] "ringo"
> # Let's say I want john's CSS selector, I'd want somethign kind of like that :
> css_selector_of_this_thing(which(unlist(strsplit(as.character(list_reponse$getElementText()), "\n")) == "john"))
> # Which would output, for instance "div#lvl3.n471"
Upvotes: 1
Views: 924
Reputation: 17611
If you know the classes will be called list
, lvl2
and lvl3
and then your text will be in the tag with class lvl3
, then you can use xpath
:
result <- mybrowser$findElement(using = 'xpath',
""//ul[@class = 'list']/*[@class = 'lvl2']/*[@class = 'lvl3'][contains(., 'john')]"")
result$getElementAttribute("outerHTML")[[1]]
# [1] "<div class=\"lvl3\" id=\"n471\">\n john\n </div>">
result$getElementTagName()[[1]] # or result$getElementAttribute("tag")[[1]]
# [1] "div"
result$getElementAttribute("class")[[1]]
# [1] "lvl3"
result$getElementAttribute("id")[[1]]
# [1] "n471"
Or more simply:
result2 <- mybrowser$findElement(using = 'xpath',
"//*[@class = 'lvl3'][contains(., 'john')]")
According to OP's comment, there are occasions when it is necessary to differentiate between john
and saint john
and johnny
. There may be xpath-based ways to go about it, but I haven't figured it out (suggestions / edits welcome). So, I'll use some regex after the initial xpath:
# use findElements (plural) to get multiple elements
result <- mybrowser$findElements(using = 'xpath',
"//*[@class = 'lvl3'][string()]")
# loop through results and gather outerHTML to examine with regex
choices <- unlist(lapply(result, function(x) x$getElementAttribute("outerHTML")))
Let's say we added johnny
as another entry, then choices
would look like this:
#[1] "<div class=\"lvl3\" id=\"n123\">\n paul\n </div>"
#[2] "<div class=\"lvl3\" id=\"n471\">\n john\n </div>"
#[3] "<div class=\"lvl3\" id=\"n951\">\n ringo\n </div>"
#[4] "<div class=\"lvl3\" id=\"n952\">\n johnny\n </div>"
We can then use regex to find the right one:
# \\W+ to look for non-word characters (i.e. [^[:alnum:]_])
# between the ">" and "<" that enclose the text
choice <- which(grepl(">\\W+john\\W+<", choices, perl = TRUE))
result[[choice]]$getElementAttribute("outerHTML")[[1]]
#[1] "<div class=\"lvl3\" id=\"n471\">\n john\n </div>"
The methods displayed above will work to get the tag name, class, and id here.
Upvotes: 1