rsj
rsj

Reputation: 788

How to find text of all child nodes

I am using webdriver to crawl websites for looking for links decorated with a magic constant, except that the text may be formatted:

<a href="blah" ..><span blah>magic</span></a>

and to many levels

<a href="blah" ..><span blah>A <span blah><b>magic</b></span> evening</span></a>

I don't know whether or not it is formatted, or if it is, how many levels deep it goes, as I'm searching through arbitrary sites.

My code looks something like this:

List<WebDriver> links = driver.getElements(By.tagName("a"));
   for (WebElement link : links) {
       List<WebElement> children = link.getElements(By.tagName("*"));
           for (WebElement child : children) {             
              if (myPattern.matcher(child.getText()).matches()) {
                 System.out.println("found match!");
              }
           }
    }

But this fails to find the match.

Any ideas on how to determine if there is a match?

Upvotes: 1

Views: 2264

Answers (2)

Eran Medan
Eran Medan

Reputation: 45765

Try to use jsoup to get the text content, then from there is pretty straight forward

String html = "<a href=\"blah\"><span blah>...<b>magic</b>...</span></a>"
String string = Jsoup.parse(html).text(); //A magic evening
if(string.contains("magic")){ //you can optimize to have word match, e.g. not "magical"
    //it's a match
}

Edit:

I didn't use WebDriver/Selenium for a long time, but I've seen something like this which looks like it might have the same effect

String innerText = ((JavascriptExecutor)driver).executeScript("return arguments[0].innerText", element);

Upvotes: 1

Dimitre Novatchev
Dimitre Novatchev

Reputation: 243529

In case you can use XPath, one useful XPath expression is:

//a[span[.//text()[. = 'magic']]]

This selects all a in the XML document that have a span child that has a text-node descendant, whose string value is the string "magic".

Upvotes: 2

Related Questions