How to find text of all child nodes

Question

I am using webdriver to crawl websites for looking for links decorated with a magic constant, except that the text may be formatted:

magic

and to many levels

A magic evening

I don't know whether or not it is formatted, or if it is, how many levels deep it goes, as I'm searching through arbitrary sites.

My code looks something like this:

List links = driver.getElements(By.tagName("a"));
   for (WebElement link : links) {
       List children = link.getElements(By.tagName("*"));
           for (WebElement child : children) {             
              if (myPattern.matcher(child.getText()).matches()) {
                 System.out.println("found match!");
              }
           }
    }

But this fails to find the match.

Any ideas on how to determine if there is a match?

Eran Medan · Accepted Answer

Try to use jsoup to get the text content, then from there is pretty straight forward

String html = "...magic..."
String string = Jsoup.parse(html).text(); //A magic evening
if(string.contains("magic")){ //you can optimize to have word match, e.g. not "magical"
    //it's a match
}

Edit:

I didn't use WebDriver/Selenium for a long time, but I've seen something like this which looks like it might have the same effect

String innerText = ((JavascriptExecutor)driver).executeScript("return arguments[0].innerText", element);

How to find text of all child nodes

Answers (2)

Related Questions