Alex Deba
Alex Deba

Reputation: 143

Groovy : parsing xml with HTML tags inside

My question is about parsing XML where string values have HTML tags inside:

def xmlString = '''
<resource>
   <string name="my_test">No problem here!</string>
   <string name="my_text">
<b> <big>My bold and big title</big></b>
   Rest of the text
  </string>
</resource>
'''

(it's an Android resource file)

When I use an XmlSlurper, the HTML tags are removed. This code:

def resources = new XmlSlurper().parseText(xmlString )
resources.string.each { string ->
    println "string name = " + string.@name + ", string value = " + string.text()
}

will produce

string name = my_test, string value = No problem here!
string name = my_text, string value = My bold and big title
   Rest of the text

I could use CDATA to prevent the HTML tags to be parsed, but then these HTML tags will not be processed when the string my_text is used.

I also tried to use a StreamingMarkupBuilder, as explained in this SO answer : How to extract HTML Code from a XML File using groovy, but then only the HTML tags and the text between them is displayed:

<b><big>My bold and big title</big></b>

and the first string is not displayed. Thanks in advance!

Upvotes: 2

Views: 1846

Answers (1)

user898650
user898650

Reputation:

def xmlString = '''
<resource>
    <string name="my_test">No problem here!</string>
    <string name="my_text">
        <b><big>My bold and big title</big></b>
        Rest of the text
    </string>
</resource>
'''

def result = []
def resources = new XmlSlurper().parseText(xmlString).string

resources.each { resource ->
    result << new groovy.xml.StreamingMarkupBuilder().bind { mkp.yield resource.getBody() }
}

Upvotes: 1

Related Questions