Reputation: 6847
def xmlNode = new XmlSlurper().parseText('<?xml version="1.0" encoding="utf-8"?><b>‰</b>')
println XmlUtil.serialize(xmlNode)
Prints next:
<?xml version="1.0" encoding="UTF-8"?>
<b>
‰
</b>
Is there way to prevent converting ‰
into ‰
? XmlSlurper
documentation says nothing.
Upvotes: 1
Views: 622
Reputation: 14529
I wrote a POC overriding XmlSlurper.characters
to handle the character entity. Apache commons StringEscapeUtils
was also needed to convert ‰
back to entity code:
@Grab(group='commons-lang', module='commons-lang', version='2.6')
import org.apache.commons.lang.StringEscapeUtils as SE
import groovy.xml.XmlUtil
def parser = new XmlSlurper() {
void characters(char[] buffer, int start, int length) {
def entity = SE.escapeXml(buffer[start].toString())
super.characters entity.toCharArray(), start, entity.size()
}
}
def xml = parser.parseText '<?xml version="1.0" encoding="utf-8"?><b>‰</b>'
def serialized = SE.unescapeXml( XmlUtil.serialize(xml) )
assert '<?xml version="1.0" encoding="UTF-8"?><b>‰</b>\n' == serialized
Note this is handling a single character, you may need to tweak it a bit if you need multicharacter handling. Also note that in the assert
a line break was needed. It was added by XmlUtil.serialize
No idea if it's the best way to do that, though.
Upvotes: 2