Reputation: 18876
I am trying to go through every html tag on a webpage's body and see if it has text in it. If it does, I would like a print out of that text:
Document doc = Jsoup.connect(site).get();
Elements e = doc.body().getAllElements();
for (int i=0; i<e.size(); i++){
if(doc.body().child(i).hasText()){
System.out.println(doc.body().child(i).text());
}
}
Above works, but not how I want it. It seems the child() method is not fine grained as it clumps multiple 'div class' elements together. How can I traverse the DOMs Body in a more fine-grained manner to see what each and every tag's text is?
Thank you in advance.
Upvotes: 0
Views: 379
Reputation: 2527
Document doc = Jsoup.connect(site).get();
doc.body().traverse(new NodeVisitor() {
@Override
public void head(Node node, int depth) {
if (node instanceof TextNode) {
TextNode tn = ((TextNode) node);
// Try to improve this filter for the nodes who contain
// texts with a whitespaces
if (tn.text().replaceAll("\\s*", "").length() > 0) {
System.out.println("Tag:" + tn.parent().nodeName()
+ ", text:" + tn.text());
}
}
}
@Override
public void tail(Node node, int depth) {
// Do Nothing
}
});
Upvotes: 1