Asaf Nevo
Asaf Nevo

Reputation: 11678

Java HhtmlUnit getByXPath() for relative search

I'm using HtmlUnits over Java to scrape my website.

I have the following div inside my HTML page:

<div class="items">
   <div class="item_title">
     <span class="title">TEXT</span>
  </div>
</div>

I got the HtmlDivision object that contains the full div. I'm trying to get to the span element using the

List<?> titleSpans = div.getByXPath("/span");

But this returns all the spans in the page.

How can I search for span elements that are only in this single HtmlDivision element?

Upvotes: 1

Views: 189

Answers (1)

hfontanez
hfontanez

Reputation: 6168

In XPath, regardless of what technology you use, a single slash (/) represents a full path from the root. In contrast, a double slash (//) is relative. Regardless if it is the first child under the current node, if you do not want to express a full path to the desired element, you must use a relative path. For you, that's "//span".

If you want to use a more specific (relative) path, use a predicate. For example, "//span[@class='title']"

UPDATE: To limit relative path to the current node, use the dot (.) before the double slash. For example ".//span[@class='title']"

Upvotes: 1

Related Questions