user308808
user308808

Reputation:

Strange behavior with tagsoup and Groovy's XmlSlurper

Let's say I want to parse the phone number from an an xml string like this:

str = """ <root> 
            <address>123 New York, NY 10019
                <div class="phone"> (212) 212-0001</div> 
            </address> 
        </root> 
    """
parser = new XmlSlurper(new org.ccil.cowan.tagsoup.Parser()).parseText (str)
println parser.address.div.text()

It doesn't print the phone number.

If I change the "div" element to "foo" like this

str = """ <root> 
            <address>123 New York, NY 10019
                <foo class="phone"> (212) 212-0001</foo> 
            </address> 
        </root> 
    """
parser = new XmlSlurper(new org.ccil.cowan.tagsoup.Parser()).parseText (str)
println parser.address.foo.text()

Then its able to parse and print the phone number.

What the heck is going on?

Btw I am using groovy 1.7.5 and tagsoup 1.2

Upvotes: 3

Views: 816

Answers (3)

DataScientYst
DataScientYst

Reputation: 442

I know that this question is very old. But I faced recently and this is what I used:

parser.'**'.findAll { it.name() == 'div' && [email protected]() == 'phone' }.each { div ->
    println div.text()
}
  1. Using depthFirst find all tags
  2. Filter by name div that has class phone;
  3. Print the value (212) 212-0001

Groovy version is 2.4

Upvotes: 0

winstaan74
winstaan74

Reputation: 1131

I seem to recall that tagsoup normalizes HTML tags - i.e. it uppercases them. So the GPath expression you want is probably

println parser.ADDRESS.DIV.text()

I find it handy to be able to print out the result of the parse - then you can see why your GPath isn't working. Use this..

println groovy.xml.XmlUtil.serialize(parser)

Upvotes: 0

oiavorskyi
oiavorskyi

Reputation: 2941

Just change code to

println parser.address.'div'.text()

This is curse of Groovy and many other dynamic language - "div" is reserved method name thus you don't get node but rather try to divide "address" node :)

Upvotes: 1

Related Questions