Reputation: 3638
Here's some raw HTML
(taken from a large file):
<h1 class="contentHeader">This is the header</h1>
Using JSoup's traverse
method, I'm gone through the DOM and located this element, along with it's attributes, which is:
doc.traverse(new NodeVisitor() {
@Override
public void head(Node node, int depth) {
System.out.println(node);
System.out.println("Node depth: " + depth);
Attributes attrList = node.attributes();
for (Attribute attr: attrList) {
System.out.println(attr);
}
....
}
This produces:
<h1 class="contentHeader">This is the header</h1>
Node depth: 8
class="contentHeader"
What I'm now trying to do is to write a single line implementation for finding this element. I've been reading through the JSoup Cookbook and it seems that it should be possible by using the eq
selector to specify a depth, but I'm having no luck. The best I can come up with is this:
System.out.println(doc.select("h1.contentHeader:eq(8)"));
But this outputs no data. I'm either missing something crucial, misunderstanding the API, or just being plain wrong.
Any input or advice would be greatly appreciated.
Upvotes: 0
Views: 964
Reputation: 12214
eq
is a CSS's pseudo class/selector and it is not used to select by depth. Here is the proper explanation about what eq
does:
The index-related selectors (
:eq()
,:lt()
,:gt()
,:even
,:odd
) filter the set of elements that have matched the expressions that precede them. They narrow the set down based on the order of the elements within this matched set. For example, if elements are first selected with a class selector (.myclass
) and four elements are returned, these elements are given indices0
through3
for the purposes of these selectors.Note that since JavaScript arrays use 0-based indexing, these selectors reflect that fact. This is why
$( ".myclass:eq(1)")
selects the second element in the document with the classmyclass
, rather than the first. In contrast,:nth-child(n)
uses 1-based indexing to conform to the CSS specification.
So, eq
is not about depth.
But, if your HTML have a class
attribute, why not use it:
System.out.println(doc.select("h1.contentHeader"));
You can also write an extremely descendant selector for this node (it is just an example, since I don't know your HTML structure):
System.out.println(doc.select("body div .someClass div div h1.contentHeader"));
Upvotes: 1