Reputation: 17268
String html = Jsoup.connect(url).timeout(1000*1000).get().html();
Document doc = Jsoup.parse(html);
Elements H2 = doc.select("div h2");
for (Element e: H2) {
//get absolute path of element e
}
It seems there's no way of doing that just using Jsoup. If not, any other java package help achieve it?
Upvotes: 3
Views: 7208
Reputation: 243479
There are solutions for this problem.
Once upon a time I provided this answer:
https://stackoverflow.com/a/4747858/36305
Upvotes: 1
Reputation: 163342
There is no such thing as "the" absolute path for an element. There are many different paths that will select an element. Examples of such paths that people sometimes ask for are:
/a/b/c/d
/a[1]/b[2]/c[3]/d[4]
/*[1]/*[2]/*[3]/*[4]
the problem with the first two cases is that they don't work if there are namespaces involved. The third path solves that problem but the path isn't as informative as people would sometimes like. If you want a path that is both informative and independent of the namespace context then you need something that uses predicates of the form *[local-name()='a' and namespace-uri()='......']
.
It's the difficulty with namespaces that means you don't find many library routines that return the path to an element.
Upvotes: 0
Reputation: 3833
Jsoup still doesn't have support for getting Xpath directly from element.
There is still a pending implementation suggestion.
Upvotes: 2
Reputation: 9914
The following link explains how to apply XPATH in jsoup
And at the end of this article the author has its comments:
"If you like to extract specific data from the HTML, then Jsoup is the way to go."
Upvotes: 0