Tusar
Tusar

Reputation: 23

Removing only an html tag and leaving behind the text inside the tag using Jsoup

Just want to remove only the inner tag "span" and don't want to remove the text inside it

<blockquote>
      <span>I don’t even bring up technology.</span> 
          I talk about the flow of data.&rdquo;
      <cite>&ndash;Rick Hassman, CIO, Pella</cite>
</blockquote>

After parsing it should look like

    <blockquote>
            I don’t even bring up technology.
              I talk about the flow of data.&rdquo;
          <cite>&ndash;Rick Hassman, CIO, Pella</cite>
    </blockquote>

Please help..

Upvotes: 2

Views: 804

Answers (3)

Yevgen
Yevgen

Reputation: 1667

Use StringUtils#substringBetween from Apache Commons Lang, it might save you a lot of effort.

String spanText = StringUtils.substringBetween(source, "<span>", "</span>");
String result = source.replaceAll("<span>.+</span>", spanText);

Upvotes: 0

Luk
Luk

Reputation: 2246

The simplest way to solve it would be to use String.replace() method.

String newHtml = html.replaceAll( "<\\/?\\s*span.*?>", "");

If you prefer to use Jsoup, then it gets more complicated:

        Document doc = Jsoup.parse(html);
        for (Element e : doc.select("span")) {

            Element parent = e.parent();
            Element newParent = parent.clone();
            newParent.empty();
            for (Node n : parent.childNodes()) {

                if (n instanceof Element && ((Element) n).tag().getName().equals("span")) {
                    newParent.append(((Element) n).html());
                } else {
                    newParent.append(n.outerHtml());
                }

            }
            parent.replaceWith(newParent);

        }

Upvotes: 3

BillIT
BillIT

Reputation: 53

If your tag is correct and you ask how to do this by Java...

String hi = "Hello World!"
String no_o = hi.replaceAll("o", "");

...should help.

Upvotes: 0

Related Questions