Reputation: 885
Looking for a regex based String replacement in Java on the below use-case. I'm doing some Groovy based XML processing and due to some custom processing (won't go much in detail on this), the resulting XML has some invalid tags, for e.g.
<?xml version='1.0' encoding='UTF-8'?>
<Customer id="xyz" xmlns='http://abc.com'>
<order orderGroup="mock">
<entry>
<key>test</key>
</entry>
</order orderGroup="mock">
</Customer id="xyz">
If you note, the end tags of the element names containing attributes are messed up. The XML is just treated as a string, so what I want to do is, just replace occurrences of such end tags via string regex based replacements. For e.g. replace
</order orderGroup="mock"> with </order>,
</Customer id="xyz"> with </Customer>
Any idea if there is quick Java String based regex I can use for doing such replacements ?
Thanks.
Upvotes: 1
Views: 1555
Reputation: 95518
The easiest solution is to fix your custom XML processing and have it generate valid XML.
The easy solution is to use something like JTidy to clean up your XML.
If you must use regex, you could try something like this:
Pattern pattern = Pattern.compile("</([A-Za-z]+) [^>]+>");
Matcher matcher = pattern.matcher(xml);
if(matcher.find()) {
xml = matcher.replaceAll(matcher.group(1));
}
I haven't tested this out, so keep that in mind. There might be a few issues.
Explanation of the regex:
< -> The opening angle bracket of the tag
/ -> The / that marks a closing tag
( -> Start of a capturing group. We want to capture the actual ending tag.
[A-Za-z]+ -> One or more alphabetic characters (upper and lowercase)
) -> End of the capturing group.
-> A space.
[^>]+ -> One or more of anything that is not a closing angle-bracket.
> -> The closing angle bracket of the tag.
Upvotes: 2