Groovy XmlSlurper get value out of NodeChildren

Question

I'm parsing HTML and trying to get full / not parsed value out of one particular node.

HTML example:


    
        Hello 
 World 
 !

Code:

def tagsoupParser = new org.ccil.cowan.tagsoup.Parser()
def slurper = new XmlSlurper(tagsoupParser)
def htmlParsed = slurper.parseText(stringToParse)

println htmlParsed.body.div[0]

However it returns only text in case of first node and I get empty string for the second node. Question: how can I retrieve value of the first node such that I get:

Hello 
 World 
 !

Nick Grealy · Accepted Answer

This is what I used to get the content from the first div tag (omitting xml declaration and namespaces).

Groovy

@Grab('org.ccil.cowan.tagsoup:tagsoup:1.2.1')
import org.ccil.cowan.tagsoup.Parser
import groovy.xml.*

def html = """
    
        Hello 
 World 
 !
        
    
"""

def parser = new Parser()
parser.setFeature('http://xml.org/sax/features/namespaces',false)
def root = new XmlSlurper(parser).parseText(html)
println new StreamingMarkupBuilder().bindNode(root.body.div[0]).toString()

Gives

Hello 

 World 

 !

N.B. Unless I'm mistaken, Tagsoup is adding the closing tags. If you literally want Hello World !, you might have to use a different library (maybe regex?).

I know it's including the div element in the output... is this a problem?

Groovy XmlSlurper get value out of NodeChildren

Answers (1)

Groovy

Gives

Related Questions