quantum285
quantum285

Reputation: 1032

HtmlUnit - getByXPath with unknown element type

I'm using HtmlUnit to scrape data and I'm getting used to the syntax of XPath. However I've run into a problem.

I have an element that I need to pull that varies between pages, sometimes it is a "span" element and sometimes it is an "a" element (a link). The reason being simply sometimes the item I am scraping has a link and sometimes it is just plain text (to state the obvious). What is the same however is an attribute called "data-reactid", which always has a set value of, let's just say 99. I've been reading and messing around, and have been trying things like this:

HtmlElement element = (HtmlElement) myPage.getFirstByXPath("//@data-reactid='99'");
System.out.println(element.getTextContent());

I am getting the following error:

java.lang.ClassCastException: java.lang.Boolean cannot be cast to com.gargoylesoftware.htmlunit.html.HtmlElement

Why getFirstByXPath() is returning a boolean is beyond me.

So my question is, how can I access an element by a specified attribute and value, when I do not know what type the element will be?

Thanks!

Upvotes: 0

Views: 534

Answers (1)

bjimba
bjimba

Reputation: 928

It's giving you a boolean because your XPath is asking for a boolean. Your XPath,

//@data-reactid='99'

is asking the question "does there exist a data-reactid attribute anywhere in my document with a value of 99?"

What you want is a predicate -- that is, "select elements where this logical condition is true". For all elements (we'll use a * wildcard since we don't know the name) that have a @data-reactid of 99:

//*[@data-reactid = '99']

Upvotes: 1

Related Questions