Reputation: 55924
I'd like to remove certain tags from an XML document as part of a filtering process but I cannot otherwise modify the appearance or structure of the XML.
The input XML comes in as a string eg:
<?xml version="1.0" encoding="UTF-8"?>
<main>
<mytag myattr="123"/>
<mytag myattr="456"/>
</main>
and the output needs to remove mytag
where the attribute value is, say, 456:
<?xml version="1.0" encoding="UTF-8"?>
<main>
<mytag myattr="123"/>
</main>
A diff should show only the removed tags as differences between the input and output.
I've looked into SAX, StAX and JAXB but it doesn't look like it is possible to output XML in the same format as it was input with any of these APIs. They will instead form well structured XML with proper indentation and whitespace which will sometimes appear to show differences from the input.
My current method uses regular expressions but is not very robust as it doesn't consider all the possible ways of structuring the above XML. For example, to match the attribute value:
myAttr\s*=\s*"([^"]*)"
This works on the example above, but won't work given this XML tag:
<mytag myattr=
123></mytag>
Are regular expressions really the best option in this situation?
Upvotes: 2
Views: 146
Reputation: 60448
Don't use regular expressions to parse XML! You already know what happens when you try, and I have a spiel on why this is.
In your case you should use XSLT. An XSLT file to do what you want is very simple and easy to follow. It's basically the following:
<xsl:template match="mytag[@myattr=123]">
</xsl:template>
<xsl:template match="*|@*">
<xsl:copy>
<xsl:apply-templates select="*|@*" />
</xsl:copy>
</xsl:template>
Which will copy any element as long as it's not mytag
with attribute myattr=123
.
I tested it on your example file and got the output you said you wanted.
Now, as for how you use XSLT with Java, looks like an entire book has been written on the subject. You can probably use whatever XML library is your favourite. I've never actually used XSLT with Java before so I can't tell you which library is easiest to use.
Upvotes: 5