Reputation: 1032
I'm using HtmlUnit to scrape data and I'm getting used to the syntax of XPath. However I've run into a problem.
I have an element that I need to pull that varies between pages, sometimes it is a "span" element and sometimes it is an "a" element (a link). The reason being simply sometimes the item I am scraping has a link and sometimes it is just plain text (to state the obvious). What is the same however is an attribute called "data-reactid", which always has a set value of, let's just say 99. I've been reading and messing around, and have been trying things like this:
HtmlElement element = (HtmlElement) myPage.getFirstByXPath("//@data-reactid='99'");
System.out.println(element.getTextContent());
I am getting the following error:
java.lang.ClassCastException: java.lang.Boolean cannot be cast to com.gargoylesoftware.htmlunit.html.HtmlElement
Why getFirstByXPath() is returning a boolean is beyond me.
So my question is, how can I access an element by a specified attribute and value, when I do not know what type the element will be?
Thanks!
Upvotes: 0
Views: 534
Reputation: 928
It's giving you a boolean because your XPath is asking for a boolean. Your XPath,
//@data-reactid='99'
is asking the question "does there exist a data-reactid attribute anywhere in my document with a value of 99?"
What you want is a predicate -- that is, "select elements where this logical condition is true". For all elements (we'll use a *
wildcard since we don't know the name) that have a @data-reactid of 99:
//*[@data-reactid = '99']
Upvotes: 1