Sergio Serra
Sergio Serra

Reputation: 1479

Jsoup efficient way to remove html elements and children's

I want to remove html div and table tables tags and anything inside it(childs), what's the best way to do it ?

I tried traversing the document like this but it's not working, in Jsoup documentation it says that node.remove() removes the element from the DOM and his children's:

doc.traverse(new NodeVisitor() {
                @Override
                public void head(Node node, int i) {

                }

                @Override
                public void tail(Node node, int i) {
                    //Log.i(TAG,"node: "+node.nodeName());
                    if( node.nodeName().compareTo("table") == 0 ||
                            node.nodeName().compareTo("div") == 0 )
                       node.remove();

                }
            });

Upvotes: 9

Views: 10599

Answers (2)

hubs
hubs

Reputation: 11

Document doc = Jsoup.parse(html);
doc.select("table *").remove();

Upvotes: 1

ashatte
ashatte

Reputation: 5538

Have you tried the remove() function of the Elements class?

Document doc = Jsoup.parse(html);
doc.select("div").remove();
doc.select("table").remove();

This should select and remove all <div> and <table> elements.

Upvotes: 21

Related Questions