Reputation: 14640
I'm writing code where I retrieve XML from a web api, then parse that XML using Groovy. Unfortunately, it seems that both XmlParser and XmlSlurper for Groovy strip newline characters from the attributes of nodes when .text() is called.
How can I get at the text of the attribute including the newlines?
Sample code:
def xmltest = '''
<snippet>
<preSnippet att1="testatt1" code="This is line 1
This is line 2
This is line 3" >
<lines count="10" />
</preSnippet>
</snippet>'''
def parsed = new XmlParser().parseText( xmltest )
println "Parsed"
parsed.preSnippet.each { pre ->
println pre.attribute('code');
}
def slurped = new XmlSlurper().parseText( xmltest )
println "Slurped"
slurped.children().each { preSnip ->
println [email protected]()
}
the output of which is:
Parsed
This is line 1 This is line 2 This is line 3
Slurped
This is line 1 This is line 2 This is line 3
Ok, I was able to convert the text before I parsed it, then re-convert after, a la:
def newxml = xmltest.replaceAll( /code="[^"]*/ ) {
return it.replaceAll( /\n/, "~#~" )
}
def parsed = new XmlParser().parseText( xmltest )
def code = pre.attribute('code').replaceAll( "~#~", "\n" )
Not my favorite hack, but it'll do until they fix their XML output.
Upvotes: 1
Views: 2336
Reputation: 1
I think you are misreading the XML spec. Newlines are allowed in attribute values but if the declared value of the attribute is one of the tokenized types then the whitespace is normalized.
Upvotes: 0
Reputation: 5405
New lines are not supported in attributes - this is from the XML specification. They end up 'normalised' which in this case, means they get replaced with a space character. See this section of the spec: http://www.w3.org/TR/REC-xml/#AVNormalize
My team had this problem and our solution was to switch to using elements rather than attributes.
Upvotes: 2