Reputation: 307
Say I have this html:
<!-- some comment -->
<div class="someDiv">
... other html
</div>
<!-- some comment 2 -->
<div class="someDiv">
... other html
</div>
I'm currently getting all divs where class == someDiv and scraping them for information. To do that I'm simply doing this:
Document doc = Jsoup.connect(url).get();
Elements elements = doc.select(".someDiv");
for (Element element : elements) {
//scrape stuff
}
Within the for loop, is there any way to get the comment tag found before the particular div.someDiv element I'm on?
If this isn't possible, should I go about parsing this html structure differently with this requirement?
Thanks for any advice.
Upvotes: 2
Views: 4038
Reputation: 37061
Try something like this, Iterate over all comments and check if their sibling is the div you were after
for (int i = 0; i < doc.childNodes().size(); i++) {
Node child = doc.childNode(i);
if (child.nodeName().equals("#comment")) {
//do some checking on child.nextSibling() , like hasAttr or attr to figure out if it the div you were expecting for...
}
}
Take a look at the jsoup Node docs
Upvotes: 2
Reputation: 353
Though this question is a few month old here my answer for completeness. How about using previousSibling
to get the preceding Node
. Of course in the real code you probably want to check, whether you really get a Comment
there.
String html = "<!-- some comment --><div class=\"someDiv\">... other html</div><!-- some comment 2 --><div class=\"someDiv\">... other html</div>";
Document doc = Jsoup.parseBodyFragment(html);
Elements elements = doc.select(".someDiv");
for (Element element : elements) {
System.out.println(((Comment) element.previousSibling()).getData());
}
This produces:
some comment
some comment 2
(tested with jsoup 1.6.1 and 1.6.3)
Upvotes: 4