user1491739
user1491739

Reputation: 1017

Scala HTML parser object usage

I am using the HTML parser to parse an HTML string:

import nu.validator.htmlparser.{sax,common}
import sax.HtmlParser
import common.XmlViolationPolicy

val source = Source.fromString(response)
val html = new models.HTML5Parser
val htmlObject = html.loadXML(source)

How do I pull values for specific elements in the object? I can get the child and the label using this:

val child = htmlObject.child(1).label

But I don't know how to get the content of the child. Also, I don't know how to iterate through the child objects.

Upvotes: 2

Views: 2482

Answers (1)

Travis Brown
Travis Brown

Reputation: 139038

It's unclear where your HTML5Parser class comes from, but I'm going to assume it's the one in this example (or something similar). In that case your htmlObject is just a scala.xml.Node. First for some setup:

val source = Source.fromString(
  "<html><head/><body><div class='main'><span>test</span></div></body></html>"
)

val htmlObject = html.loadXML(source)

Now you can do the following, for example:

scala> htmlObject.child(1).label
res0: String = body

scala> htmlObject.child(1).child(0).child(0).text
res1: String = test

scala> (htmlObject \\ "span").text
res2: String = test

scala> (htmlObject \ "body" \ "div" \ "span").text
res3: String = test

scala> (htmlObject \\ "div").head.attributes.asAttrMap
res4: Map[String,String] = Map(class -> main)

Etcetera.

Upvotes: 3

Related Questions