Reputation: 1017
I am using the HTML parser to parse an HTML string:
import nu.validator.htmlparser.{sax,common}
import sax.HtmlParser
import common.XmlViolationPolicy
val source = Source.fromString(response)
val html = new models.HTML5Parser
val htmlObject = html.loadXML(source)
How do I pull values for specific elements in the object? I can get the child and the label using this:
val child = htmlObject.child(1).label
But I don't know how to get the content of the child. Also, I don't know how to iterate through the child objects.
Upvotes: 2
Views: 2482
Reputation: 139038
It's unclear where your HTML5Parser
class comes from, but I'm going to assume it's the one in this example (or something similar). In that case your htmlObject
is just a scala.xml.Node
. First for some setup:
val source = Source.fromString(
"<html><head/><body><div class='main'><span>test</span></div></body></html>"
)
val htmlObject = html.loadXML(source)
Now you can do the following, for example:
scala> htmlObject.child(1).label
res0: String = body
scala> htmlObject.child(1).child(0).child(0).text
res1: String = test
scala> (htmlObject \\ "span").text
res2: String = test
scala> (htmlObject \ "body" \ "div" \ "span").text
res3: String = test
scala> (htmlObject \\ "div").head.attributes.asAttrMap
res4: Map[String,String] = Map(class -> main)
Etcetera.
Upvotes: 3