Unescape XML tags only. Keep content escaped

I have to consume a WS that sends its XML data inside a CDATA tag, the output I get is the following:

<parent>
    <child1>
        <xmltag1>4 años < 8 </xmltag1>
        <xmltag2>3 años < 12 </xmltag2>
    <child1>
</parent>

I have to format this data to an usable XML so I can work with it.

It should look like:

<parent>
    <child1>
        <xmltag1>4 años &lt; 8 </xmltag1>
        <xmltag2>3 años &lt; 12 </xmltag2>
    <child>
</parent>

With various java functions like this one i havent got a decent output:

StringEscapeUtils.unescapeXml(string);

There could be a way of getting that result by using regex, so far I got this, but regex is not my strength:

string.replaceAll("&lt;{0}>", "</{0}>");

Upvotes: 2

Views: 630

Answers (1)

Wiktor Stribiżew
Wiktor Stribiżew

Reputation: 627082

You can use

String fixedXml = text.replaceAll("&lt;(/?\\w+(?:\\s[^>]*)?>)", "<$1");

See the regex demo. Details:

  • &lt; - a &lt; string
  • (/?\\w+(?:\\s[^>]*)?>) - Group 1 ($1):
    • /? - an optional / char
    • \w+ - one or more word chars
    • (?:\s[^>]*)? - an optional sequence of a whitespace char and then any zero or more chars other than >
    • > - a > char.

Upvotes: 1

Related Questions