Extract only HTML tags and attributes from a html string using Jsoup

Question

I want to fetch only the HTML content along with the attributes and remove the text.

Input String:

String html = "An 
   example 
this is  the  link ";

Output

Edit: Most of the questions in google or stackoverflow are only related to removing the html and extract text only. I spent around 3 hours to come across the below mentioned solutions. So posting it here as it will help others

Siva · Accepted Answer

Hope this helps someone like me looking to remove only the text content from the HTML string.

Output

String html = "An 
   example 
this is  the  link ";
       Traverser traverser = new Traverser();

       Document document = Jsoup.parse(html, "", Parser.xmlParser());// you can use the html parser as well. which will add the html tags

       document.traverse(traverser);
       System.out.println(traverser.extractHtmlBuilder.toString());

By appending the node.attributes will includes all the attributes.

    public static class Traverser implements NodeVisitor {

        StringBuilder extractHtmlBuilder = new StringBuilder();

        @Override
        public void head(Node node, int depth) {
            if (node instanceof Element && !(node instanceof Document)) {
                extractHtmlBuilder.append("<").append(node.nodeName()).append(node.attributes()).append(">");
            }
        }

        @Override
        public void tail(Node node, int depth) {
            if (node instanceof Element && !(node instanceof Document)) {
                extractHtmlBuilder.append("");
            }
        }
    }

Another Solution:

 Document document = Jsoup.parse(html, "", Parser.xmlParser());
        for (Element element : document.select("*")) {
            if (!element.ownText().isEmpty()) {
                for (TextNode node : element.textNodes())
                    node.remove();
            }
        }
        System.out.println(document.toString());

Extract only HTML tags and attributes from a html string using Jsoup

Answers (1)

Related Questions