Reputation: 67
I'm pulling down some markup from a url and returning a single scala.xml.Node like so ...
def doGoogleSearch(query:String) : scala.xml.Node = {
val tmpUrl = "http://www.google.com?q="
val tmp = tmpUrl.concat(query)
val url = new URL(tmp)
val conn = url.openConnection
val sorce:InputSource = new InputSource
val neo = new TagSoupFactoryAdapter
val input = conn.getInputStream
sorce.setByteStream(input)
val markup = neo.loadXML(sorce)
input.close
return markup
}
Next I want to loop through each child element inside the markup and what I have so far only seems to print 2x (yet this is a huge amount of html coming back). What am I missing here?
def loopThroughChildren(markup:scala.xml.Node) : String = {
for (i <- 0 until markup.child.length) {
//println(??
}
return ""
}
Thank you in advance!
Upvotes: 3
Views: 3670
Reputation: 16422
This will take a command line argument (filename) and print all tag labels found:
import scala.xml._
import scala.annotation.tailrec
object XmlTagLister extends App {
require(args.length == 1, "You must provide an XML filename to be analyzed.")
val data = XML.loadFile(args(0))
@tailrec
def collectTags(elems: List[Node], tags: Set[String]): Set[String] =
elems match {
case h :: t => collectTags((h \ "_").theSeq.toList ::: t, tags + h.label)
case Nil => tags
}
val allTags = collectTags((data \ "_").toList, Set())
allTags foreach println
}
Output looks like this:
onetag
AnotherTag
anothertag
(XML tags are case sensitive)
Upvotes: 0
Reputation: 15742
Anyways, here's a recursive function for you:
def processNode(node: Node) {
if (node.isInstanceOf[Text]) println(node.text)
node.child foreach processNode
}
This will print the contents of all text nodes in the document. If you feed it with e.g.:
<html>
<head>
<title>Welcome</title>
</head>
<body>
<div>
<p>Foo</p>
</div>
</body>
</html>
It will produce:
Welcome
Foo
Upvotes: 5
Reputation: 20515
Or equivalently
for(child<-markup.child){
// child is a scala.xml.Node
}
Upvotes: 3
Reputation: 67898
As a simple solution, you could say
markup.child.map { child =>
// child is a scala.xml.Node
}
and possibly use recursion, depending on what you want to do.
Upvotes: 5