François M.
François M.

Reputation: 4278

Ho do I get the html of the element of a list which == "some value"?

I have the following HTML code :

<ul class="list" role="listbox" id="list1">

  <li class="lvl2">
    <div class="lvl3" id="lvl3-nb-1">
      choice1
    </div>
  </li>

  <li class="lvl2">
    <div class="lvl3" id="lvl3-nb-2">
      choice2
    </div>
  </li>

  <li class="lvl2">
    <div class="lvl3" id="lvl3-nb-3">
      choice3
    </div>
  </li>

</ul>

I'd like to get the HTML (outer HTML, HTML + element, selector, Xpath, it doesn't matter) of the element which is == "choice2"

How can I do that with RSelenium ?

Thanks

EDIT for clarification : the ids of the elements of the list are dynamic (and basically random), so the solution I need cannot be referring to their HTML or CSS. However, I know for sure the value of choice1, choice2 and choice3 (and basically everything else, I know the classes will be called list, lvl2 and lvl3 for instance).

Attempt at a reproducible example :

HTML :

<ul class="list" id="list1">
  <li class="lvl2">
    <div class="lvl3" id="n123">
      paul
    </div>
  </li>
  <li class="lvl2">
    <div class="lvl3" id="n471">
      john
    </div>
  </li>
  <li class="lvl2">
    <div class="lvl3" id="n951">
      ringo
    </div>
  </li>
</ul>

R :

> library(RSelenium)
> startServer()
> mybrowser <- remoteDriver()
> mybrowser$open()
> mybrowser$navigate("http://example.com")
> list_of_beatles <- mybrowser$findElement(using = 'css selector', "ul#list.list1")

> print(unlist(strsplit(as.character(list_of_beatles$getElementText()), "\n")))
[1] "paul"                              "john"              
[3] "ringo"

> # Let's say I want john's CSS selector, I'd want somethign kind of like that :
> css_selector_of_this_thing(which(unlist(strsplit(as.character(list_reponse$getElementText()), "\n")) == "john"))
> # Which would output, for instance "div#lvl3.n471" 

Upvotes: 1

Views: 924

Answers (1)

Jota
Jota

Reputation: 17611

If you know the classes will be called list, lvl2 and lvl3 and then your text will be in the tag with class lvl3, then you can use xpath:

result <- mybrowser$findElement(using = 'xpath',
    ""//ul[@class = 'list']/*[@class = 'lvl2']/*[@class = 'lvl3'][contains(., 'john')]"")

result$getElementAttribute("outerHTML")[[1]]
# [1] "<div class=\"lvl3\" id=\"n471\">\n      john\n    </div>">

result$getElementTagName()[[1]] # or result$getElementAttribute("tag")[[1]]
# [1] "div"

result$getElementAttribute("class")[[1]]
# [1] "lvl3"

result$getElementAttribute("id")[[1]]
# [1] "n471"

Or more simply:

result2 <- mybrowser$findElement(using = 'xpath',
    "//*[@class = 'lvl3'][contains(., 'john')]")

Edit:

According to OP's comment, there are occasions when it is necessary to differentiate between john and saint john and johnny. There may be xpath-based ways to go about it, but I haven't figured it out (suggestions / edits welcome). So, I'll use some regex after the initial xpath:

# use findElements (plural) to get multiple elements
result <- mybrowser$findElements(using = 'xpath',
    "//*[@class = 'lvl3'][string()]")

# loop through results and gather outerHTML to examine with regex
choices <- unlist(lapply(result, function(x) x$getElementAttribute("outerHTML")))

Let's say we added johnny as another entry, then choices would look like this:

#[1] "<div class=\"lvl3\" id=\"n123\">\n      paul\n    </div>"  
#[2] "<div class=\"lvl3\" id=\"n471\">\n      john\n    </div>"  
#[3] "<div class=\"lvl3\" id=\"n951\">\n      ringo\n    </div>" 
#[4] "<div class=\"lvl3\" id=\"n952\">\n      johnny\n    </div>"

We can then use regex to find the right one:

# \\W+ to look for non-word characters (i.e. [^[:alnum:]_])
# between the ">" and "<" that enclose the text 
choice <- which(grepl(">\\W+john\\W+<", choices, perl = TRUE))

result[[choice]]$getElementAttribute("outerHTML")[[1]]
#[1] "<div class=\"lvl3\" id=\"n471\">\n      john\n    </div>"

The methods displayed above will work to get the tag name, class, and id here.

Upvotes: 1

Related Questions