Terry Li
Terry Li

Reputation: 17268

How to get absolute path of an html element

String html = Jsoup.connect(url).timeout(1000*1000).get().html();
Document doc = Jsoup.parse(html);
Elements H2 = doc.select("div h2");
for (Element e: H2) {
  //get absolute path of element e
}

It seems there's no way of doing that just using Jsoup. If not, any other java package help achieve it?

Upvotes: 3

Views: 7208

Answers (4)

Dimitre Novatchev
Dimitre Novatchev

Reputation: 243479

There are solutions for this problem.

Once upon a time I provided this answer:

https://stackoverflow.com/a/4747858/36305

Upvotes: 1

Michael Kay
Michael Kay

Reputation: 163342

There is no such thing as "the" absolute path for an element. There are many different paths that will select an element. Examples of such paths that people sometimes ask for are:

/a/b/c/d
/a[1]/b[2]/c[3]/d[4]
/*[1]/*[2]/*[3]/*[4]

the problem with the first two cases is that they don't work if there are namespaces involved. The third path solves that problem but the path isn't as informative as people would sometimes like. If you want a path that is both informative and independent of the namespace context then you need something that uses predicates of the form *[local-name()='a' and namespace-uri()='......'].

It's the difficulty with namespaces that means you don't find many library routines that return the path to an element.

Upvotes: 0

HashimR
HashimR

Reputation: 3833

Jsoup still doesn't have support for getting Xpath directly from element.

There is still a pending implementation suggestion.

Upvotes: 2

UVM
UVM

Reputation: 9914

The following link explains how to apply XPATH in jsoup

jsoup: Java HTML Parser

And at the end of this article the author has its comments:

"If you like to extract specific data from the HTML, then Jsoup is the way to go."

Upvotes: 0

Related Questions