Reputation: 788
I am using webdriver to crawl websites for looking for links decorated with a magic constant, except that the text may be formatted:
<a href="blah" ..><span blah>magic</span></a>
and to many levels
<a href="blah" ..><span blah>A <span blah><b>magic</b></span> evening</span></a>
I don't know whether or not it is formatted, or if it is, how many levels deep it goes, as I'm searching through arbitrary sites.
My code looks something like this:
List<WebDriver> links = driver.getElements(By.tagName("a"));
for (WebElement link : links) {
List<WebElement> children = link.getElements(By.tagName("*"));
for (WebElement child : children) {
if (myPattern.matcher(child.getText()).matches()) {
System.out.println("found match!");
}
}
}
But this fails to find the match.
Any ideas on how to determine if there is a match?
Upvotes: 1
Views: 2264
Reputation: 45765
Try to use jsoup to get the text content, then from there is pretty straight forward
String html = "<a href=\"blah\"><span blah>...<b>magic</b>...</span></a>"
String string = Jsoup.parse(html).text(); //A magic evening
if(string.contains("magic")){ //you can optimize to have word match, e.g. not "magical"
//it's a match
}
Edit:
I didn't use WebDriver/Selenium for a long time, but I've seen something like this which looks like it might have the same effect
String innerText = ((JavascriptExecutor)driver).executeScript("return arguments[0].innerText", element);
Upvotes: 1
Reputation: 243529
In case you can use XPath, one useful XPath expression is:
//a[span[.//text()[. = 'magic']]]
This selects all a
in the XML document that have a span
child that has a text-node descendant, whose string value is the string "magic"
.
Upvotes: 2