JimmyBond
JimmyBond

Reputation: 67

How to loop over a list of children found inside a single scala.xml.Node

I'm pulling down some markup from a url and returning a single scala.xml.Node like so ...

  def doGoogleSearch(query:String) : scala.xml.Node = {
    val tmpUrl = "http://www.google.com?q="
    val tmp = tmpUrl.concat(query)

    val url = new URL(tmp)
    val conn = url.openConnection

    val sorce:InputSource = new InputSource
    val neo = new TagSoupFactoryAdapter
    val input = conn.getInputStream

    sorce.setByteStream(input)
    val markup = neo.loadXML(sorce)
    input.close

    return markup
  }

Next I want to loop through each child element inside the markup and what I have so far only seems to print 2x (yet this is a huge amount of html coming back). What am I missing here?

def loopThroughChildren(markup:scala.xml.Node) : String = {
    for (i <- 0 until markup.child.length) {
      //println(??
    }
  return ""
}

Thank you in advance!

Upvotes: 3

Views: 3670

Answers (4)

yǝsʞǝla
yǝsʞǝla

Reputation: 16422

This will take a command line argument (filename) and print all tag labels found:

import scala.xml._
import scala.annotation.tailrec

object XmlTagLister extends App {
  require(args.length == 1, "You must provide an XML filename to be analyzed.")

  val data = XML.loadFile(args(0))

  @tailrec
  def collectTags(elems: List[Node], tags: Set[String]): Set[String] =
    elems match {
      case h :: t => collectTags((h \ "_").theSeq.toList ::: t, tags + h.label)
      case Nil => tags
    }

  val allTags = collectTags((data \ "_").toList, Set())

  allTags foreach println
}

Output looks like this:

onetag
AnotherTag
anothertag

(XML tags are case sensitive)

Upvotes: 0

Knut Arne Vedaa
Knut Arne Vedaa

Reputation: 15742

Anyways, here's a recursive function for you:

def processNode(node: Node) {
  if (node.isInstanceOf[Text]) println(node.text)
  node.child foreach processNode
}

This will print the contents of all text nodes in the document. If you feed it with e.g.:

<html>
    <head>
        <title>Welcome</title>
    </head>
    <body>
        <div>
            <p>Foo</p>
        </div>
    </body>
</html>

It will produce:

Welcome
Foo

Upvotes: 5

Dave Griffith
Dave Griffith

Reputation: 20515

Or equivalently

for(child<-markup.child){
 // child is a scala.xml.Node
}

Upvotes: 3

Debilski
Debilski

Reputation: 67898

As a simple solution, you could say

markup.child.map { child =>
  // child is a scala.xml.Node
}

and possibly use recursion, depending on what you want to do.

Upvotes: 5

Related Questions